Chapter Contents |
Previous |
Next |
The GLM Procedure |
Data | Design Matrix | |||||||||||
A | B | |||||||||||
A | B | A1 | A2 | B1 | B2 | B3 | ||||||
1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | |||||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 0 | |||||
1 | 3 | 1 | 1 | 0 | 0 | 0 | 1 | |||||
2 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | |||||
2 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | |||||
2 | 3 | 1 | 0 | 1 | 0 | 0 | 1 |
There are more columns for these effects than there are degrees of freedom for them; in other words, PROC GLM is using an over-parameterized model.
Data | Design Matrix | |||||||||||||||||
A | B | A*B | ||||||||||||||||
A | B | A1 | A2 | B1 | B2 | B3 | A1B1 | A1B2 | A1B3 | A2B1 | A2B2 | A2B3 | ||||||
1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |||||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |||||
1 | 3 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | |||||
2 | 1 | 1 | 0 | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | |||||
2 | 2 | 1 | 0 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | |||||
2 | 3 | 1 | 0 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
In this matrix, main-effects columns are not linearly independent of crossed-effect columns; in fact, the column space for the crossed effects contains the space of the main effect.
model y=a b(a); | (B nested within A) | |
model y=a a*b; | (omitted main effect for B) |
The nesting operator in PROC GLM is more a notational convenience than an operation distinct from crossing. Nested effects are characterized by the property that the nested variables never appear as main effects. The order of the variables within nesting parentheses is made to correspond to the order of these variables in the CLASS statement. The order of the columns is such that variables outside the parentheses index faster than those inside the parentheses, and the rightmost nested variables index faster than the leftmost variables.
Data | Design Matrix | ||||||||||||
A | B(A) | ||||||||||||
A | B | A1 | A2 | B1A1 | B2A1 | B3A1 | B1A2 | B2A2 | B3A2 | ||||
1 | 1 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | |||
1 | 2 | 1 | 1 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | |||
1 | 3 | 1 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | |||
2 | 1 | 1 | 0 | 1 | 0 | 0 | 0 | 1 | 0 | 0 | |||
2 | 2 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 1 | 0 | |||
2 | 3 | 1 | 0 | 1 | 0 | 0 | 0 | 0 | 0 | 1 |
Data | Design Matrix | ||||||||||
A | X(A) | ||||||||||
X | A | A1 | A2 | X(A1) | X(A2) | ||||||
21 | 1 | 1 | 1 | 0 | 21 | 0 | |||||
24 | 1 | 1 | 1 | 0 | 24 | 0 | |||||
22 | 1 | 1 | 1 | 0 | 22 | 0 | |||||
28 | 2 | 1 | 0 | 1 | 0 | 28 | |||||
19 | 2 | 1 | 0 | 1 | 0 | 19 | |||||
23 | 2 | 1 | 0 | 1 | 0 | 23 |
This model estimates a separate slope for X within each level of A.
Data | Design Matrix | ||||||||||||
A | X*A | ||||||||||||
X | A | X | A1 | A2 | X*A1 | X*A2 | |||||||
21 | 1 | 1 | 21 | 1 | 0 | 21 | 0 | ||||||
24 | 1 | 1 | 24 | 1 | 0 | 24 | 0 | ||||||
22 | 1 | 1 | 22 | 1 | 0 | 22 | 0 | ||||||
28 | 2 | 1 | 28 | 0 | 1 | 0 | 28 | ||||||
19 | 2 | 1 | 19 | 0 | 1 | 0 | 19 | ||||||
23 | 2 | 1 | 23 | 0 | 1 | 0 | 23 |
Continuous-by-class effects are used to test the homogeneity of slopes. If the continuous-by-class effect is nonsignificant, the effect can be removed so that the response with respect to X is the same for all levels of the class variables.
The continuous list comes first, followed by the crossed list, followed by the nested list in parentheses.
The sequencing of parameters is important to learn if you use the CONTRAST or ESTIMATE statement to compute or test some linear function of the parameter estimates.
Effects may be retitled by PROC GLM to correspond to ordering rules. For example, B*A(E D) may be retitled A*B(D E) to satisfy the following:
The sequencing of the parameters generated by an effect can be described by which variables have their levels indexed faster:
For example, suppose a model includes four effects - A, B, C, and D -each having two levels, 1 and 2. If the CLASS statement is
class A B C D;
then the order of the parameters for the effect B*A(C D), which is retitled A*B(C D), is as follows.
A1 B1 C1 D1 |
A1 B2 C1 D1 |
A2 B1 C1 D1 |
A2 B2 C1 D1 |
A1 B1 C1 D2 |
A1 B2 C1 D2 |
A2 B1 C1 D2 |
A2 B2 C1 D2 |
A1 B1 C2 D1 |
A1 B2 C2 D1 |
A2 B1 C2 D1 |
A2 B2 C2 D1 |
A1 B1 C2 D2 |
A1 B2 C2 D2 |
A2 B1 C2 D2 |
A2 B2 C2 D2 |
Note that first the crossed effects B and A are sorted in the order in which they appear in the CLASS statement so that A precedes B in the parameter list. Then, for each combination of the nested effects in turn, combinations of A and B appear. The B effect changes fastest because it is rightmost in the (renamed) cross list. Then A changes next fastest. The D effect changes next fastest, and C is the slowest since it is leftmost in the nested list.
When numeric class variables are used, their levels are sorted by their character format, which may not correspond to their numeric sort sequence. Therefore, it is advisable to include a format for numeric class variables or to use the ORDER=INTERNAL option in the PROC GLM statement to ensure that levels are sorted by their internal values.
Other procedures (such as the CATMOD procedure) reparameterize models to full rank using certain restrictions on the parameters. PROC GLM does not reparameterize, making the hypotheses that are commonly tested more understandable. See Goodnight (1978) for additional reasons for not reparameterizing.
PROC GLM does not actually construct the entire design matrix X; rather, a row xi of X is constructed for each observation in the data set and used to accumulate the crossproduct matrix .
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.