Overdispersion

The LOGISTIC Procedure

Overdispersion

For a correctly specified model, the Pearson chi-square statistic and the deviance, divided by their degrees of freedom, should be approximately equal to one. When their values are much larger than one, the assumption of binomial variability may not be valid and the data are said to exhibit overdispersion. Underdispersion, which results in the ratios being less than one, occurs less often in practice.

When fitting a model, there are several problems that can cause the goodness-of-fit statistics to exceed their degrees of freedom. Among these are such problems as outliers in the data, using the wrong link function, omitting important terms from the model, and needing to transform some predictors. These problems should be eliminated before proceeding to use the following methods to correct for overdispersion.

Rescaling the Covariance Matrix

One way of correcting overdispersion is to multiply the covariance matrix by a dispersion parameter. This method assumes that the sample sizes in each subpopulation are approximately equal. You can supply the value of the dispersion parameter directly, or you can estimate the dispersion parameter based on either the Pearson chi-square statistic or the deviance for the fitted model.

The Pearson chi-square statistic $\chi_P^2$ and the deviance $\chi_D^2$ are given by

$\chi_P^2 &=& \sum_{i=1}^m \sum_{j=1}^{k+1} \frac{(r_{ij} - n_i\hat{p}_{ij})^2}{... ...=& 2 \sum_{i=1}^m \sum_{j=1}^{k+1} r_{ij} \log (\frac{r_{ij}}{n_i\hat{p}_{ij}})$

where m is the number of subpopulation profiles, k+1 is the number of response levels, r_ij is the total weight associated with jth level responses in the ith profile, $n_i = \sum_{j=1}^{k+1}r_{ij}$ , and $\hat{p}_{ij}$ is the fitted probability for the jth level at the ith profile. Each of these chi-square statistics has mk - q degrees of freedom, where q is the number of parameters estimated. The dispersion parameter is estimated by

$\hat{\sigma}^2 = \{ \chi_P^2/(mk-q) & { SCALE=PEARSON} \ \chi_D^2/(mk-q) & { SCALE=DEVIANCE} \ (constant)^2 & { SCALE={constant} } .$

In order for the Pearson statistic and the deviance to be distributed as chi-square, there must be sufficient replication within the subpopulations. When this is not true, the data are sparse, and the p-values for these statistics are not valid and should be ignored. Similarly, these statistics, divided by their degrees of freedom, cannot serve as indicators of overdispersion. A large difference between the Pearson statistic and the deviance provides some evidence that the data are too sparse to use either statistic.

You can use the AGGREGATE (or AGGREGATE=) option to define the subpopulation profiles. If you do not specify this option, each observation is regarded as coming from a separate subpopulation. For events/trials syntax, each observation represents n Bernoulli trials, where n is the value of the trials variable; for single-trial syntax, each observation represents a single trial. Without the AGGREGATE (or AGGREGATE=) option, the Pearson chi-square statistic and the deviance are calculated only for events/trials syntax.

Note that the parameter estimates are not changed by this method. However, their standard errors are adjusted for overdispersion, affecting their significance tests.

Williams' Method

Suppose that the data consist of n binomial observations. For the ith observation, let r_i/n_i be the observed proportion and let x_i be the associated vector of explanatory variables. Suppose that the response probability for the ith observation is a random variable P_i with mean and variance

$E(P_i) = p_i {and} V(P_i) = \phi p_i (1-p_i)$

where p_i is the probability of the event, and $\phi$ is a nonnegative but otherwise unknown scale parameter. Then the mean and variance of r_i are

$E(r_i) = n_i p_i {and} V(r_i) = n_i p_i (1-p_i) [1 + (n_i - 1) \phi]$

Williams (1982) estimates the unknown parameter $\phi$ by equating the value of Pearson's chi-square statistic for the full model to its approximate expected value. Suppose w_i^* is the weight associated with the ith observation. The Pearson chi-square statistic is given by

$\chi^2 = \sum_{i=1}^n \frac{w_i^*(r_i - n_i \hat{p}_i)^2} {n_i \hat{p}_i (1 - \hat{p}_i)}$

Let g'(·) be the first derivative of the link function g(·). The approximate expected value of $\chi^2$ is

$E_{\chi^2} = \sum_{i=1}^n w_i^* ( 1 - w_i^* v_i d_i)[1 + \phi (n_i - 1)]$

where v_i=n_i/(p_i(1-p_i)[g'(p_i)]²) and d_i is the variance of the linear predictor $\hat{\alpha_i} + x_i'\hat{{\beta}}$ .The scale parameter $\phi$ is estimated by the following iterative procedure.

At the start, let w_i^*=1 and let p_i be approximated by r_i/n_i, i = 1,2, ... ,n. If you apply these weights and approximated probabilities to $\chi^2$ and $E_{\chi^2}$ and then equate them, an initial estimate of $\phi$ is therefore

$\hat{\phi}_0 = \frac{\chi^2 - (n - m)} {\sum_i (n_i - 1)(1-v_id_i)}$

where m is the total number of parameters. The initial estimates of the weights become $\hat{w}^*_{i0} = [1 + (n_i - 1)\hat{\phi}_0]^{-1}$ . After a weighted fit of the model, $\hat{{\beta}}$ is recalculated, and so is $\chi^2$ . Then a revised estimate of $\phi$ is given by

$\hat{\phi}_1 = \frac{\chi^2 - \sum_i w_i^*(1-w_i^*v_id_i)} { w_i^*(n_i-1)(1-w_i^*v_id_i)}$

The iterative procedure is repeated until $\chi^2$ is very close to its degrees of freedom.

Once $\phi$ has been estimated by $\hat{\phi}$ under the full model, weights of $(1 + (n_i-1)\hat{\phi})^{-1}$ can be used in fitting models that have fewer terms than the full model. See Example 39.8 for an illustration.

Chapter Contents
Previous
Next
Top