Example 22.5: Log-Linear Model, Structural and Sampling Zeros
This example illustrates a log-linear model of independence,
using data that contain structural zero frequencies as well
as sampling (random) zero frequencies.
In a population of six squirrel monkeys, the joint
distribution of genital display with respect to active or
passive role was observed. The data are from Fienberg
(1980, Table 8-2). Since a monkey cannot have both the
active and passive roles in the same interaction, the
diagonal cells of the table are structural zeros.
See Agresti (1990) for more information on the
quasi-independence model.
Since there is only one population, the structural zeros are
automatically deleted by PROC CATMOD. The sampling zeros are replaced
in the DATA step by
some positive number close to zero (1E-20). Also,
the row for Monkey `t' is deleted since it contains all zeros;
therefore, the cell frequencies predicted by a model of
independence are also zero. In addition, the CONTRAST
statement compares the behavior of the two monkeys labeled `u'
and `v'. The following statements produce Output 22.5.1
through Output 22.5.8:
title 'Behavior of Squirrel Monkeys';
data Display;
input Active $ Passive $ wt @@;
if Active ne 't';
if Active ne Passive then
if wt=0 then wt=1e-20;
datalines;
r r 0 r s 1 r t 5 r u 8 r v 9 r w 0
s r 29 s s 0 s t 14 s u 46 s v 4 s w 0
t r 0 t s 0 t t 0 t u 0 t v 0 t w 0
u r 2 u s 3 u t 1 u u 0 u v 38 u w 2
v r 0 v s 0 v t 0 v u 0 v v 0 v w 1
w r 9 w s 25 w t 4 w u 6 w v 13 w w 0
;
proc catmod data=Display;
weight wt;
model Active*Passive=_response_
/ freq pred=freq noparm noresponse oneway;
loglin Active Passive;
contrast 'Passive, U vs. V' Passive 0 0 0 1 -1;
contrast 'Active, U vs. V' Active 0 0 1 -1;
title2 'Test Quasi-Independence for the Incomplete Table';
quit;
Output 22.5.1: Log-Linear Model Analysis with Zero Frequencies
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Response |
Active*Passive |
Response Levels |
25 |
Weight Variable |
wt |
Populations |
1 |
Data Set |
DISPLAY |
Total Frequency |
220 |
Frequency Missing |
0 |
Observations |
25 |
|
The results of the ONEWAY option are shown in Output 22.5.2.
Monkey `t' does not show up as a value for the Active
variable since that row was removed.
Output 22.5.2: Output from the ONEWAY option
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
One-Way Frequencies |
Variable |
Value |
Frequency |
Active |
r |
23 |
|
s |
93 |
|
u |
46 |
|
v |
1 |
|
w |
57 |
Passive |
r |
40 |
|
s |
29 |
|
t |
24 |
|
u |
60 |
|
v |
64 |
|
w |
3 |
|
Output 22.5.3: Profiles
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Response Profiles |
Response |
Active |
Passive |
1 |
r |
s |
2 |
r |
t |
3 |
r |
u |
4 |
r |
v |
5 |
r |
w |
6 |
s |
r |
7 |
s |
t |
8 |
s |
u |
9 |
s |
v |
10 |
s |
w |
11 |
u |
r |
12 |
u |
s |
13 |
u |
t |
14 |
u |
v |
15 |
u |
w |
16 |
v |
r |
17 |
v |
s |
18 |
v |
t |
19 |
v |
u |
20 |
v |
w |
21 |
w |
r |
22 |
w |
s |
23 |
w |
t |
24 |
w |
u |
25 |
w |
v |
|
Sampling zeros are displayed as 1E-20 in Output 22.5.4.
The Response Number corresponds to the value displayed in
Output 22.5.2.
Output 22.5.4: Frequency of Response by Response Number
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Response Frequencies |
Sample |
Response Number |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
10 |
11 |
12 |
13 |
14 |
15 |
16 |
17 |
18 |
19 |
20 |
21 |
22 |
23 |
24 |
25 |
1 |
1 |
5 |
8 |
9 |
1E-20 |
29 |
14 |
46 |
4 |
1E-20 |
2 |
3 |
1 |
38 |
2 |
1E-20 |
1E-20 |
1E-20 |
1E-20 |
1 |
9 |
25 |
4 |
6 |
13 |
|
Output 22.5.5: Iteration History
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Maximum Likelihood Analysis |
Iteration |
Sub Iteration |
-2 Log Likelihood |
Convergence Criterion |
Parameter Estimates |
1 |
2 |
3 |
4 |
5 |
6 |
7 |
8 |
9 |
0 |
0 |
1416.3054 |
1.0000 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
0 |
1 |
0 |
1238.2417 |
0.1257 |
-0.4976 |
1.1112 |
0.1722 |
-0.8804 |
-0.006978 |
0.0827 |
-0.4735 |
0.7287 |
0.5791 |
2 |
0 |
1205.1264 |
0.0267 |
-0.3420 |
1.0962 |
0.5612 |
-1.7549 |
0.2233 |
0.3899 |
-0.4086 |
0.7875 |
0.5728 |
3 |
0 |
1199.5068 |
0.004663 |
-0.1570 |
1.2687 |
0.7058 |
-2.3992 |
0.3034 |
0.4360 |
-0.3162 |
0.8812 |
0.6703 |
4 |
0 |
1198.6271 |
0.000733 |
-0.0466 |
1.3791 |
0.8170 |
-2.8422 |
0.3309 |
0.4625 |
-0.2890 |
0.9085 |
0.6968 |
5 |
0 |
1198.5611 |
0.0000551 |
-0.002748 |
1.4230 |
0.8609 |
-3.0176 |
0.3334 |
0.4649 |
-0.2866 |
0.9110 |
0.6992 |
6 |
0 |
1198.5603 |
6.5351E-7 |
0.002760 |
1.4285 |
0.8664 |
-3.0396 |
0.3334 |
0.4649 |
-0.2865 |
0.9110 |
0.6992 |
7 |
0 |
1198.5603 |
1.217E-10 |
0.002837 |
1.4285 |
0.8665 |
-3.0399 |
0.3334 |
0.4649 |
-0.2865 |
0.9110 |
0.6992 |
Maximum likelihood computations converged. |
|
Output 22.5.6: Analysis of Variance Table
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Maximum Likelihood Analysis of Variance |
Source |
DF |
Chi-Square |
Pr > ChiSq |
Active |
4 |
56.58 |
<.0001 |
Passive |
5 |
47.94 |
<.0001 |
Likelihood Ratio |
15 |
135.17 |
<.0001 |
|
The analysis of variance table (Output 22.5.6) shows that
the model of independence does not fit since the
likelihood ratio test for the interaction is significant.
In other words, active and passive behaviors of the squirrel
monkeys are dependent behavior roles.
Output 22.5.7: Contrasts between Monkeys `u' and `v'
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Contrasts of Maximum Likelihood Estimates |
Contrast |
DF |
Chi-Square |
Pr > ChiSq |
Passive, U vs. V |
1 |
1.31 |
0.2524 |
Active, U vs. V |
1 |
14.87 |
0.0001 |
|
If the model fit these data, then
the contrasts in Output 22.5.7 show that monkeys `u' and `v'
appear to have similar passive behavior patterns but very
different active behavior patterns.
Output 22.5.8: Response Function Predicted Values
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Maximum Likelihood Predicted Values for Response Functions |
Sample |
Function Number |
Observed |
Predicted |
Residual |
Function |
Standard Error |
Function |
Standard Error |
1 |
1 |
-2.5649494 |
1.037749 |
-0.973554 |
0.339019 |
-1.5913953 |
|
2 |
-0.9555114 |
0.526235 |
-1.7250404 |
0.345438 |
0.76952896 |
|
3 |
-0.4855078 |
0.449359 |
-0.5275144 |
0.309254 |
0.0420066 |
|
4 |
-0.3677248 |
0.433629 |
-0.7392682 |
0.249006 |
0.37154345 |
|
5 |
-48.616651 |
1E10 |
-3.560517 |
0.634104 |
-45.056134 |
|
6 |
0.80234647 |
0.333775 |
0.32058886 |
0.26629 |
0.48175761 |
|
7 |
0.07410797 |
0.385164 |
-0.2993416 |
0.295634 |
0.37344956 |
|
8 |
1.26369204 |
0.314105 |
0.89818441 |
0.250857 |
0.36550763 |
|
9 |
-1.178655 |
0.571772 |
0.6864306 |
0.173396 |
-1.8650856 |
|
10 |
-48.616651 |
1E10 |
-2.1348182 |
0.608071 |
-46.481833 |
|
11 |
-1.8718022 |
0.759555 |
-0.2414953 |
0.287218 |
-1.6303069 |
|
12 |
-1.4663371 |
0.640513 |
-0.1099394 |
0.303568 |
-1.3563977 |
|
13 |
-2.5649494 |
1.037749 |
-0.8614257 |
0.314794 |
-1.7035236 |
|
14 |
1.0726368 |
0.321308 |
0.12434644 |
0.204345 |
0.94829036 |
|
15 |
-1.8718022 |
0.759555 |
-2.6969023 |
0.617433 |
0.82510014 |
|
16 |
-48.616651 |
1E10 |
-4.1478747 |
1.024508 |
-44.468777 |
|
17 |
-48.616651 |
1E10 |
-4.0163187 |
1.030062 |
-44.600332 |
|
18 |
-48.616651 |
1E10 |
-4.7678051 |
1.032457 |
-43.848846 |
|
19 |
-48.616651 |
1E10 |
-3.5702791 |
1.020794 |
-45.046372 |
|
20 |
-2.5649494 |
1.037749 |
-6.6032817 |
1.161289 |
4.03833233 |
|
21 |
-0.3677248 |
0.433629 |
-0.3658417 |
0.202959 |
-0.001883 |
|
22 |
0.65392647 |
0.34194 |
-0.2342858 |
0.232794 |
0.88821229 |
|
23 |
-1.178655 |
0.571772 |
-0.9857722 |
0.239408 |
-0.1928828 |
|
24 |
-0.7731899 |
0.493548 |
0.21175381 |
0.185007 |
-0.9849437 |
|
Output 22.5.9: Predicted Frequencies
Behavior of Squirrel Monkeys |
Test Quasi-Independence for the Incomplete Table |
Maximum Likelihood Predicted Values for Frequencies |
Sample |
Active |
Passive |
Function Number |
Observed |
Predicted |
Residual |
Frequency |
Standard Error |
Frequency |
Standard Error |
1 |
r |
s |
F1 |
1 |
0.997725 |
5.25950838 |
1.36156 |
-4.2595084 |
|
r |
t |
F2 |
5 |
2.210512 |
2.48072585 |
0.691066 |
2.51927415 |
|
r |
u |
F3 |
8 |
2.776525 |
8.21594841 |
1.855146 |
-0.2159484 |
|
r |
v |
F4 |
9 |
2.937996 |
6.64804868 |
1.50932 |
2.35195132 |
|
r |
w |
F5 |
1E-20 |
1E-10 |
0.39576868 |
0.240268 |
-0.3957687 |
|
s |
r |
F6 |
29 |
5.017696 |
19.1859928 |
3.147915 |
9.81400723 |
|
s |
t |
F7 |
14 |
3.620648 |
10.321716 |
2.169599 |
3.67828404 |
|
s |
u |
F8 |
46 |
6.031734 |
34.1846262 |
4.428706 |
11.8153738 |
|
s |
v |
F9 |
4 |
1.981735 |
27.6609647 |
3.722788 |
-23.660965 |
|
s |
w |
F10 |
1E-20 |
1E-10 |
1.64670026 |
0.952712 |
-1.6467003 |
|
u |
r |
F11 |
2 |
1.407771 |
10.936396 |
2.12322 |
-8.936396 |
|
u |
s |
F12 |
3 |
1.720201 |
12.4740717 |
2.554336 |
-9.4740717 |
|
u |
t |
F13 |
1 |
0.997725 |
5.8835826 |
1.380655 |
-4.8835826 |
|
u |
v |
F14 |
38 |
5.606814 |
15.7672979 |
2.684692 |
22.2327021 |
|
u |
w |
F15 |
2 |
1.407771 |
0.93865177 |
0.551645 |
1.06134823 |
|
v |
r |
F16 |
1E-20 |
1E-10 |
0.21996583 |
0.221779 |
-0.2199658 |
|
v |
s |
F17 |
1E-20 |
1E-10 |
0.2508934 |
0.253706 |
-0.2508934 |
|
v |
t |
F18 |
1E-20 |
1E-10 |
0.11833763 |
0.120314 |
-0.1183376 |
|
v |
u |
F19 |
1E-20 |
1E-10 |
0.39192393 |
0.393255 |
-0.3919239 |
|
v |
w |
F20 |
1 |
0.997725 |
0.01887928 |
0.021728 |
0.98112072 |
|
w |
r |
F21 |
9 |
2.937996 |
9.6576454 |
1.808656 |
-0.6576454 |
|
w |
s |
F22 |
25 |
4.707344 |
11.0155266 |
2.275019 |
13.9844734 |
|
w |
t |
F23 |
4 |
1.981735 |
5.19563797 |
1.184452 |
-1.195638 |
|
w |
u |
F24 |
6 |
2.415857 |
17.2075014 |
2.772098 |
-11.207501 |
|
w |
v |
F25 |
13 |
3.497402 |
13.9236886 |
2.24158 |
-0.9236886 |
|
Output 22.5.8 displays the predicted response functions and
Output 22.5.9 displays predicted cell frequencies (from the
PRED=FREQ option), but since the model does not fit, these
should be ignored.
Structural and Sampling Zeros with Raw Data
The preceding PROC CATMOD step uses cell count data as
input. Prior to invoking the CATMOD procedure, structural
and sampling zeros are easily identified and manipulated in
a single DATA step. For the situation where structural or
sampling zeros (or both) may exist and the input data set is
raw data, use the following steps:
- Run PROC FREQ on the raw data. In the TABLES
statement, list all dependent and independent
variables separated by asterisks and use the SPARSE
option and the OUT= option. This creates an
output data set that contains all possible zero
frequencies.
- Use a DATA step to change the zero frequencies
associated with sampling zeros to a small value, such
as 1E-20.
- Use the resulting data set as input to PROC CATMOD,
and specify the statement WEIGHT COUNT to use adjusted
frequencies.
For example, suppose the data set RawDisplay contains
the raw data for the squirrel monkey data. The following
statements show how to obtain the same analysis as shown
previously:
proc freq data=RawDisplay;
tables Active*Passive / sparse out=Combos noprint;
run;
data Combos2;
set Combos;
if Active ne 't';
if Active ne Passive then
if count=0 then count=1e-20;
run;
proc catmod data=Combos2;
weight count;
model Active*Passive=_response_
/ freq pred=freq noparm noresponse;
loglin Active Passive;
quit;
The first IF statement in the DATA step is needed
only for this particular example; since observations
for Monkey `t' were deleted from the Display data set,
they also need to be deleted from Combos2.
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.