Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The CATMOD Procedure

Weighted-Least-Squares Analysis of Mean Response

Consider the data in the following table (Stokes, Davis, and Koch 1995).

Table 22.2: Colds in Children
    Periods with Colds  
Sex Residence 0 1 2 Total
FemaleRural456471180
FemaleUrban80104116300
MaleRural8412482290
MaleUrban10611787310

For males and females in rural and urban counties, the number of periods (of two) in which subjects report cold symptoms are recorded. Thus, 45 subjects who were female and in rural counties report no cold symptoms, and 71 subjects who are female and from rural counties report colds in both periods.

The question of interest is whether the mean number of periods with colds reported is associated with gender or type of county. There is no reason to believe that the mean number of periods with colds is normally distributed, so a weighted least-squares analysis of these data is performed with PROC CATMOD instead of an analysis of variance with PROC ANOVA or PROC GLM.

The input data for categorical data is often recorded in frequency form, with the counts for each particular profile being the input values. Thus, for the colds data, the input SAS data set colds is created with the following statements. The variable count contains the frequency of observations that have the particular profile described by the values of the other variables on that input line.

   data colds;
      input sex $ residence $ periods count @@;
   datalines; 
   female rural 0  45  female rural 1  64  female rural 2  71
   female urban 0  80  female urban 1 104  female urban 2 116
   male   rural 0  84  male   rural 1 124  male   rural 2  82
   male   urban 0 106  male   urban 1 117  male   urban 2  87
   ; 
   run;

In order to fit a model to the mean number of periods with colds, you have to specify the response function in PROC CATMOD. The default response function is the logit if the response variable has two values, and it is generalized logits if the response variable has more than two values. If you want a different response function, then you request that function in the RESPONSE statement. To request the mean number of periods with colds, you specify the MEANS option in the RESPONSE statement.

You can request a model consisting of the main effects and interaction of the variables sex and residence just as you would in the GLM procedure. Unlike the GLM procedure, you don't need to use a CLASS statement in PROC CATMOD to treat a variable as a classification variable. All variables in the MODEL statement in the CATMOD procedure are treated as classification variables unless you specify otherwise with a DIRECT statement.

Thus, the PROC CATMOD statements required to model mean periods of colds with a main effects and interaction model are

   proc catmod data=colds;
      weight count;
      response means; 
      model periods = sex residence sex*residence;
   run;

The results of this analysis are shown in Figure 22.1 through Figure 22.3.

 
The CATMOD Procedure

Response periods Response Levels 3
Weight Variable count Populations 4
Data Set COLDS Total Frequency 1080
Frequency Missing 0 Observations 12
 
Population Profiles
Sample sex residence Sample Size
1 female rural 180
2 female urban 300
3 male rural 290
4 male urban 310
 
Response Profiles
Response periods
1 0
2 1
3 2
Figure 22.1: Model Information and Profile Tables

The CATMOD procedure first displays a summary of the contingency table you are analyzing. The "Population Profiles" table lists the values of the explanatory variables that define each population, or row of the underlying contingency table, and labels each group with a sample number. The number of observations in each population is also displayed. The "Response Profiles" table lists the variable levels that define the response, or columns of the underlying contingency table.

 
The CATMOD Procedure

Sample Response
Function
Design Matrix
1 2 3 4
1 1.14444 1 1 1 1
2 1.12000 1 1 -1 -1
3 0.99310 1 -1 1 -1
4 0.93871 1 -1 -1 1
Figure 22.2: Observed Response Functions and Design Matrix

The "Design Matrix" table contains the observed response functions -in this case, the mean number of periods with colds for each of the populations - and the design matrix. The first column of the design matrix contains the coefficients for the intercept parameter, the second column coefficients are for the sex parameter (note that the sum-to-zero constraint of a full-rank parameterization implies that the coefficient for males is the negative of that for females. The parameter is called the differential effect for females), the third column is similarly set up for residence, and the last column is for the interaction.

 
The CATMOD Procedure

Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 1841.13 <.0001
sex 1 11.57 0.0007
residence 1 0.65 0.4202
sex*residence 1 0.09 0.7594
Residual 0 . .
Figure 22.3: ANOVA Table for the Saturated Model

The model-fitting results are displayed in the "Analysis of Variance" table (Figure 22.3), which is similar to an ANOVA table. The effects from the right-hand side of the MODEL statement are listed under the "Source" column.

The interaction effect is nonsignificant, so the data is reanalyzed using a main-effects model. Since PROC CATMOD is an interactive procedure, you can analyze the main-effects model by simply submitting the new MODEL statement as follows. The resulting tables are displayed in Figure 22.4 through Figure 22.7.

   model periods = sex residence;
   run;

 
The CATMOD Procedure

Response periods Response Levels 3
Weight Variable count Populations 4
Data Set COLDS Total Frequency 1080
Frequency Missing 0 Observations 12
 
Population Profiles
Sample sex residence Sample Size
1 female rural 180
2 female urban 300
3 male rural 290
4 male urban 310
 
Response Profiles
Response periods
1 0
2 1
3 2
Figure 22.4: Population and Response Profiles, Main-Effects Model

 
The CATMOD Procedure

Sample Response
Function
Design Matrix
1 2 3
1 1.14444 1 1 1
2 1.12000 1 1 -1
3 0.99310 1 -1 1
4 0.93871 1 -1 -1
Figure 22.5: Design Matrix for the Main-Effects Model

 
The CATMOD Procedure

Analysis of Variance
Source DF Chi-Square Pr > ChiSq
Intercept 1 1882.77 <.0001
sex 1 12.08 0.0005
residence 1 0.76 0.3839
Residual 1 0.09 0.7594
Figure 22.6: ANOVA Table for the Main-Effects Model

The goodness-of-fit chi-square statistic is 0.09 with one degree of freedom and a p-value of 0.7594; hence, the model fits the data. Note that the chi-square tests in Figure 22.6 test whether all the parameters for a given effect are zero. In this model, each effect has only one parameter, and therefore only one degree of freedom.

 
The CATMOD Procedure

Analysis of Weighted Least Squares Estimates
Effect Parameter Estimate Standard
Error
Chi-
Square
Pr > ChiSq
Intercept 1 1.0501 0.0242 1882.77 <.0001
sex 2 0.0842 0.0242 12.08 0.0005
residence 3 0.0210 0.0241 0.76 0.3839
Figure 22.7: Parameter Estimates for the Main-Effects Model

The "Analysis of Weighted-Least-Squares Estimates" table lists the parameters and their estimates for the model, as well as the standard errors, Wald statistics, and p-values. These chi-square tests are single degree-of-freedom tests that the individual parameter is equal to zero. They are equal to the tests shown in Figure 22.6 since each effect is composed of exactly one parameter.

You can compute the mean number of periods of colds for the first population (Sample 1, females in rural residences) from Table 22.2 as follows.

mean colds = 0×(45/180) + 1×(64/180) + 2×(71/180) = 1.1444
This is the same value as reported for the Response Function for Sample 1 in Figure 22.5.

PROC CATMOD is fitting a model to the mean number of colds in each population as follows:

[
{Expected number of colds for rural females}\{urban females}\{rural males}\{ur...
 ...]
=
[
1 & 1 & 1 \1 & 1 & -1 \1 & -1 & 1 \1 & -1 & -1]
[
\beta_0\\beta_1\\beta_2]
where the design matrix is the same one displayed in Figure 22.5, \beta_0 is the mean number of colds averaged over all the populations, \beta_1 is the differential effect for females, and \beta_2 is the differential effect for rural residences. The parameter estimates are shown in Figure 22.7; thus, the expected number of periods with colds for rural females from this model is
1×1.0501 + 1×0.0842 + 1×0.0210 = 1.1553
and the expected number for rural males from this model is
1×1.0501 - 1×0.0842 + 1×0.0210 = 0.9869

Notice also, in Figure 22.7, that the differential effect for residence is nonsignificant (p=0.3839): If you continued the analysis by fitting a single effect model (sex), you would need to include a POPULATION statement to maintain the same underlying contingency table.

      population sex residence;
      model periods = sex;
   run;

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.