Chapter Contents |
Previous |
Next |
The CATMOD Procedure |
If there is more than one dependent variable, and you specify RESPONSE MEANS, then the effective sample size for each response function is the same as the actual sample size. Thus, a sample size of 30 could be sufficient to support four response functions, provided that the functions are the means of four dependent variables.
For any log-linear model analysis, it is important to remember that PROC CATMOD creates response profiles only for those profiles that are actually observed. Thus, for any log-linear model analysis with one population (the usual case), there are no zeros in the contingency table, which means that the CATMOD procedure treats all zero frequencies as structural zeros. If there is more than one population, then a zero can appear in the body of the contingency table, in which case the zero is treated as a sampling zero (as long as some population has a nonzero count for that profile). If you want zero frequencies that PROC CATMOD would normally treat as structural zeros to be interpreted as sampling zeros, simply insert a one-line statement into the data step that changes each zero to a very small number (such as 1E-20). Refer to Bishop, Fienberg, and Holland (1975) for a discussion of the issues and Example 22.5 for an illustration of a log-linear model analysis of data that contain both structural and sampling zeros.
If you perform a weighted least-squares analysis on a contingency table that contains zero cell frequencies, then avoid using the LOG transformation as the first transformation on the observed proportions. In general, it may be better to change the response functions or to pool some of the response categories than to settle for the 0.5 correction or to use the ADDCELL option.
Warning: The _RESPONSE_ effect may be testing the wrong hypothesis since the marginal levels of the dependent variables do not coincide. Consult the response profiles and the CATMOD documentation.
The following examples illustrate situations in which the _RESPONSE_ effect tests the wrong hypothesis.
Suppose you specify the following statements:
data A1; input Time1 Time2 @@; datalines; 1 2 2 3 1 3 ; proc catmod; response marginals; model Time1*Time2=_response_; repeated Time 2 / _response_=Time; run;
One marginal probability is computed for each dependent variable, resulting in two response functions. The model is a saturated one: one degree of freedom for the intercept and one for the main effect of Time. Except for the warning message, PROC CATMOD produces an analysis with no apparent errors, but the "Response Profiles" table displayed by PROC CATMOD is as follows.
Response Profiles | ||
Response | Time1 | Time2 |
1 | 1 | 2 |
2 | 1 | 3 |
3 | 2 | 3 |
Since RESPONSE MARGINALS yields marginal probabilities for every level but the last, the two response functions being analyzed are Prob(Time1=1) and Prob(Time2=2). Thus, the Time effect is testing the hypothesis that Prob(Time1=1)=Prob(Time2=2). What it should be testing is the hypothesis that
Prob(Time1=1) = Prob(Time2=1) Prob(Time1=2) = Prob(Time2=2) Prob(Time1=3) = Prob(Time2=3)
but there are not enough data to support the test (assuming that none of the probabilities are structural zeros by the design of the study).
Suppose you specify
data a1; input Time1 Time2 @@; datalines; 2 1 2 2 1 1 1 2 2 1 ; proc catmod order=data; response marginals; model Time1*Time2=_response_; repeated Time 2 / _response_=Time; run;
As in the preceding example, one marginal probability is computed for each dependent variable, resulting in two response functions. The model is also the same: one degree of freedom for the intercept and one for the main effect of Time. PROC CATMOD issues the warning message and displays the following "Response Profiles" table.
Response Profiles | ||
Response | Time1 | Time2 |
1 | 2 | 1 |
2 | 2 | 2 |
3 | 1 | 1 |
4 | 1 | 2 |
Although the marginal levels are the same for the two dependent variables, they are not in the same order because the ORDER=DATA option specified that they be ordered according to their appearance in the input stream. Since RESPONSE MARGINALS yields marginal probabilities for every level except the last, the two response functions being analyzed are Prob(Time1=2) and Prob(Time2=1). Thus, the Time effect is testing the hypothesis that Prob(Time1=2)=Prob(Time2=1). What it should be testing is the hypothesis that
Prob(Time1=1) = Prob(Time2=1) Prob(Time1=2) = Prob(Time2=2)
Whenever the warning message appears, look at the "Response Profiles" table or the "One-Way Frequencies" table to determine what hypothesis is actually being tested. For the latter example, a correct analysis can be obtained by deleting the ORDER=DATA option or by reordering the data so that the (1,1) observation is first.
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.