Example 39.7: Goodness-of-Fit Tests and Subpopulations
A study is done to investigate the effects of two binary
factors, A and B, on a binary response, Y.
Subjects are randomly selected from subpopulations defined
by the four possible combinations of levels of A and
B. The number of subjects responding with each level
of Y is recorded and entered into data set A.
data a;
do A=0,1;
do B=0,1;
do Y=1,2;
input F @@;
output;
end;
end;
end;
datalines;
23 63 31 70 67 100 70 104
;
A full model is fit to examine the
main effects of A and B as well as the
interaction effect of A and B.
proc logistic data=a;
freq F;
model Y=A B A*B;
run;
Output 39.7.1: Full Model Fit
|
| Model Information |
| Data Set |
WORK.A |
| Response Variable |
Y |
| Number of Response Levels |
2 |
| Number of Observations |
8 |
| Frequency Variable |
F |
| Sum of Frequencies |
528 |
| Link Function |
Logit |
| Optimization Technique |
Fisher's scoring |
| Response Profile |
Ordered Value |
Y |
Total Frequency |
| 1 |
1 |
191 |
| 2 |
2 |
337 |
| Model Convergence Status |
| Convergence criterion (GCONV=1E-8) satisfied. |
| Model Fit Statistics |
| Criterion |
Intercept Only |
Intercept and Covariates |
| AIC |
693.061 |
691.914 |
| SC |
697.330 |
708.990 |
| -2 Log L |
691.061 |
683.914 |
| Testing Global Null Hypothesis: BETA=0 |
| Test |
Chi-Square |
DF |
Pr > ChiSq |
| Likelihood Ratio |
7.1478 |
3 |
0.0673 |
| Score |
6.9921 |
3 |
0.0721 |
| Wald |
6.9118 |
3 |
0.0748 |
| Analysis of Maximum Likelihood Estimates |
| Parameter |
DF |
Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
| Intercept |
1 |
-1.0074 |
0.2436 |
17.1015 |
<.0001 |
| A |
1 |
0.6069 |
0.2903 |
4.3714 |
0.0365 |
| B |
1 |
0.1929 |
0.3254 |
0.3515 |
0.5533 |
| A*B |
1 |
-0.1883 |
0.3933 |
0.2293 |
0.6321 |
Association of Predicted Probabilities and Observed Responses |
| Percent Concordant |
42.2 |
Somers' D |
0.118 |
| Percent Discordant |
30.4 |
Gamma |
0.162 |
| Percent Tied |
27.3 |
Tau-a |
0.054 |
| Pairs |
64367 |
c |
0.559 |
|
Pearson and Deviance goodness-of-fit tests cannot be
obtained for this model since a full model containing four
parameters is fit, leaving no residual degrees of freedom.
For a binary response model, the goodness-of-fit tests have
m-q degrees of freedom, where m is the number of
subpopulations and q is the number of model parameters.
In the preceding model, m=q=4, resulting in zero degrees
of freedom for the tests.
Results of the model fit are shown in Output 39.7.1.
Notice that neither the A*B interaction nor the
B main effect is significant. If a reduced model
containing only the A effect is fit, two degrees of
freedom become available for testing goodness of fit.
Specifying the SCALE=NONE option requests the Pearson and
deviance statistics. With single-trial syntax, the
AGGREGATE= option is needed to define the subpopulations in
the study. Specifying AGGREGATE=(A B) creates
subpopulations of the four combinations of levels of A
and B. Although the B effect is being dropped
from the model, it is still needed to define the original
subpopulations in the study. If AGGREGATE=(A) were
specified, only two subpopulations would be created from the
levels of A, resulting in m=q=2 and zero degrees of
freedom for the tests.
proc logistic data=a;
freq F;
model Y=A / scale=none aggregate=(A B);
run;
Output 39.7.2: Reduced Model Fit
|
| Model Information |
| Data Set |
WORK.A |
| Response Variable |
Y |
| Number of Response Levels |
2 |
| Number of Observations |
8 |
| Frequency Variable |
F |
| Sum of Frequencies |
528 |
| Link Function |
Logit |
| Optimization Technique |
Fisher's scoring |
| Response Profile |
Ordered Value |
Y |
Total Frequency |
| 1 |
1 |
191 |
| 2 |
2 |
337 |
| Model Convergence Status |
| Convergence criterion (GCONV=1E-8) satisfied. |
| Deviance and Pearson Goodness-of-Fit Statistics |
| Criterion |
DF |
Value |
Value/DF |
Pr > ChiSq |
| Deviance |
2 |
0.3541 |
0.1770 |
0.8377 |
| Pearson |
2 |
0.3531 |
0.1765 |
0.8382 |
| Number of unique profiles: 4 |
| Model Fit Statistics |
| Criterion |
Intercept Only |
Intercept and Covariates |
| AIC |
693.061 |
688.268 |
| SC |
697.330 |
696.806 |
| -2 Log L |
691.061 |
684.268 |
| Testing Global Null Hypothesis: BETA=0 |
| Test |
Chi-Square |
DF |
Pr > ChiSq |
| Likelihood Ratio |
6.7937 |
1 |
0.0091 |
| Score |
6.6779 |
1 |
0.0098 |
| Wald |
6.6210 |
1 |
0.0101 |
| Analysis of Maximum Likelihood Estimates |
| Parameter |
DF |
Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
| Intercept |
1 |
-0.9013 |
0.1614 |
31.2001 |
<.0001 |
| A |
1 |
0.5032 |
0.1955 |
6.6210 |
0.0101 |
Association of Predicted Probabilities and Observed Responses |
| Percent Concordant |
28.3 |
Somers' D |
0.112 |
| Percent Discordant |
17.1 |
Gamma |
0.246 |
| Percent Tied |
54.6 |
Tau-a |
0.052 |
| Pairs |
64367 |
c |
0.556 |
|
The goodness-of-fit tests (Output 39.7.2)
show that dropping the B main effect and the
A*B interaction simultaneously does not result in significant lack of
fit of the model. The tests' large p-values indicate insufficient
evidence for rejecting the null hypothesis that the model fits.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.