Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The PROBIT Procedure

Lack of Fit Tests

Two goodness-of-fit tests can be requested from the PROBIT procedure -a Pearson chi-square test and a log-likelihood ratio chi-square test.

If there is only a single continuous independent variable, the data are internally sorted to group response values by the independent variable. Otherwise, the data are aggregated into groupings that are delimited whenever a change is observed in one of the independent variables.

Note: Because of this grouping, the data set should be sorted by the independent variables before the PROBIT procedure is run if the LACKFIT option is specified.

If the Pearson goodness-of-fit chi-square test is requested and the p-value for the test is too small, variances and covariances are adjusted by a heterogeneity factor (the goodness-of-fit chi-square divided by its degrees of freedom) and a critical value from the t distribution is used to compute the fiducial limits. The Pearson chi-square test statistic is computed as

\sum_i \sum_j \frac{(r_{ij} - n_i p_{ij})^2}{n_i p_{ij}}
where the sum on i is over grouping, the sum on j is over levels of response, the rij is the frequency of response level j for the ith grouping, ni is the total frequency for the ith grouping, and pij is the fitted probability for the jth level at the ith grouping.

The log-likelihood ratio chi-square test statistic is computed as

2 \sum_i \sum_j r_{ij} \ln ( \frac{r_{ij}}{n_i p_{ij}} 
 )
This quantity is sometimes called the deviance. If the modeled probabilities fit the data, these statistics should be approximately distributed as chi-square with degrees of freedom equal to (k - 1) ×m - q, where k is the number of levels of the multinomial or binomial response, m is the number of sets of independent variable values (covariate patterns), and q is the number of parameters fit in the model.

In order for the Pearson statistic and the deviance to be distributed as chi-square, there must be sufficient replication within the groupings. When this is not true, the data are sparse, and the p-values for these statistics are not valid and should be ignored. Similarly, these statistics, divided by their degrees of freedom, cannot serve as indicators of overdispersion. A large difference between the Pearson statistic and the deviance provides some evidence that the data are too sparse to use either statistic.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.