Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CATMOD Procedure

Log-Linear Model Analysis

When the response functions are the default generalized logits, then inclusion of the keyword _RESPONSE_ in every effect in the right-hand side of the MODEL statement induces a log-linear model. The keyword _RESPONSE_ tells PROC CATMOD that you want to model the variation among the dependent variables. You then specify the actual model in the LOGLIN statement.

One word of caution about log-linear model analyses: sampling zeros in the input data set should be replaced by some positive number close to zero (such as 1E-20) to ensure that these sampling zeros are not treated as structural zeros. This can be performed in a DATA step that changes cell counts for sampling zeros to a very small number. Data containing sampling zeros should be analyzed with maximum likelihood estimation. See the "Cautions" section and Example 22.5 for further information and an illustration for both cell count data and raw data.

When you perform log-linear model analysis, you can request weighted least-squares estimates, maximum likelihood estimates, or both. By default, PROC CATMOD calculates maximum likelihood estimates when the default response functions are used. The following table provides appropriate MODEL statements for the combinations of types of estimates.

Estimation Desired MODEL Statement
Maximum likelihoodmodel a*b=_response_;
Weighted least squaresmodel a*b=_response_ / wls;
Maximum likelihood and weighted least squaresmodel a*b=_response_ / wls ml;

One Population

The usual log-linear model analysis has one population, which means that all of the variables are dependent variables. For example, the statements

   proc catmod;
      weight wt;
      model r1*r2=_response_;
      loglin r1|r2;
   run;

yield a maximum likelihood analysis of a saturated log-linear model for the dependent variables r1 and r2.

If you want to fit a reduced model with respect to the dependent variables (for example, a model of independence or conditional independence), specify the reduced model in the LOGLIN statement. For example, the statements

   proc catmod;
      weight wt;
      model r1*r2=_response_ / pred;
      loglin r1 r2;
   run;

yield a main-effects log-linear model analysis of the factors r1 and r2. The output includes Wald statistics for the individual effects r1 and r2, as well as predicted cell probabilities. Moreover, the goodness-of-fit statistic is the likelihood ratio test for the hypothesis of independence between r1 and r2 or, equivalently, a test of r1*r2.

Multiple Populations

You can do log-linear model analysis with multiple populations by using a POPULATION statement or by including effects on the right-hand side of the MODEL statement that contain independent variables. Each effect must include the _RESPONSE_ keyword.

For example, suppose the dependent variables r1 and r2 are dichotomous, and the independent variable group has three levels. Then

   proc catmod;
      weight wt;
      model r1*r2=_response_ group*_response_;
      loglin r1|r2;
   run;

specifies a saturated model (three degrees of freedom for _RESPONSE_ and six degrees of freedom for the interaction between _RESPONSE_ and group). From another point of view, _RESPONSE_*group can be regarded as a main effect for group with respect to the three response functions, while _RESPONSE_ can be regarded as an intercept effect with respect to the functions. In other words, these statements give essentially the same results as the logistic analysis:

   proc catmod;
      weight wt;
      model r1*r2=group;
   run;

The ability to model the interaction between the independent and the dependent variables becomes particularly useful when a reduced model is specified for the dependent variables. For example,

   proc catmod;
      weight wt;
      model r1*r2=_response_ group*_response_;
      loglin r1 r2;
   run;

specifies a model with two degrees of freedom for _RESPONSE_ (one for r1 and one for r2) and four degrees of freedom for the interaction of _RESPONSE_*group. The likelihood ratio goodness-of-fit statistic (three degrees of freedom) tests the hypothesis that r1 and r2 are independent in each of the three groups.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.