Two way analysis of variance
DESIGN MATRIX
This is the name given to the analysis of models in which there
are categorical factors and continuous covariates. In the car
example we had the categorical factor VEHICLE and the continuous covariate
MILEAGE. Earlier I gave the design matrix for the model in which
there are different intercepts for the two cars but 1 common slope.
thus this model is 2 parallel lines. If we use corner point coding
and fit a model in which VEHICLE and MILEAGE interact
then the design matrix for the small data set above is
You saw, in assignment 3, how to test the hypothesis of no interaction in this model.
Two way ANOVA: influence of SCHOOL, REGION on STAY
options pagesize=60 linesize=80;
data scenic;
infile 'scenic.dat' firstobs=2;
input Stay Age Risk Culture Chest Beds
School Region Census Nurses Facil;
proc glm data=scenic;
class school region ;
model Stay = School | Region / E
SOLUTION SS1 SS2 SS3 SS4 XPX INVERSE;
output out=scout P=Fitted PRESS=PRESS H=HAT
RSTUDENT =EXTST R=RESID DFFITS=DFFITS COOKD=COOKD;
run ;
proc means data=scout;
var stay;
class school region;
run;
proc print data=scout;
EDITED SAS OUTPUT (Complete output)
The X'X Matrix
INTERCEPT SCHOOL 1 SCHOOL 2 REGION 1 REGION 2
INTERCEPT 113 17 96 28 32
SCHOOL 1 17 17 0 5 7
SCHOOL 2 96 0 96 23 25
REGION 1 28 5 23 28 0
REGION 2 32 7 25 0 32
REGION 3 37 3 34 0 0
REGION 4 16 2 14 0 0
DUMMY001 5 5 0 5 0
DUMMY002 7 7 0 0 7
DUMMY003 3 3 0 0 0
DUMMY004 2 2 0 0 0
DUMMY005 23 0 23 23 0
DUMMY006 25 0 25 0 25
DUMMY007 34 0 34 0 0
DUMMY008 14 0 14 0 0
STAY 1090.26 186.85 903.41 310.49 309.87
X'X Generalized Inverse (g2)
INTERCEPT SCHOOL 1 SCHOOL 2 REGION 1 REGION 2
INTERCEPT 0.0714285714 -0.071428571 0 -0.071428571 -0.071428571
SCHOOL 1 -0.071428571 0.5714285714 0 0.0714285714 0.0714285714
SCHOOL 2 0 0 0 0 0
REGION 1 -0.071428571 0.0714285714 0 0.1149068323 0.0714285714
REGION 2 -0.071428571 0.0714285714 0 0.0714285714 0.1114285714
REGION 3 -0.071428571 0.0714285714 0 0.0714285714 0.0714285714
REGION 4 0 0 0 0 0
DUMMY001 0.0714285714 -0.571428571 0 -0.114906832 -0.071428571
DUMMY002 0.0714285714 -0.571428571 0 -0.071428571 -0.111428571
DUMMY003 0.0714285714 -0.571428571 0 -0.071428571 -0.071428571
DUMMY004 0 0 0 0 0
DUMMY005 0 0 0 0 0
DUMMY006 0 0 0 0 0
DUMMY007 0 0 0 0 0
DUMMY008 0 0 0 0 0
STAY 7.89 1.79 0 2.9304347826 1.5372
Dependent Variable: STAY
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 7 132.06558693 18.86651242 7.15 0.0001
Error 105 277.14479360 2.63947422
Corrected Total 112 409.21038053
R-Square C.V. Root MSE STAY Mean
0.322733 16.83864 1.6246459 9.6483186
Source DF Type I SS Mean Square F Value Pr > F
SCHOOL 1 36.08413010 36.08413010 13.67 0.0003
REGION 3 95.36410217 31.78803406 12.04 0.0001
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718
Source DF Type II SS Mean Square F Value Pr > F
SCHOOL 1 27.89404890 27.89404890 10.57 0.0015
REGION 3 95.36410217 31.78803406 12.04 0.0001
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718
Source DF Type III SS Mean Square F Value Pr > F
SCHOOL 1 26.05955792 26.05955792 9.87 0.0022
REGION 3 47.01938029 15.67312676 5.94 0.0009
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718
Source DF Type IV SS Mean Square F Value Pr > F
SCHOOL 1 26.05955792 26.05955792 9.87 0.0022
REGION 3 47.01938029 15.67312676 5.94 0.0009
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9718
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT 7.890000000 B 18.17 0.0001 0.43420487
SCHOOL 1 1.790000000 B 1.46 0.1480 1.22811685
2 0.000000000 B . . .
REGION 1 2.930434783 B 5.32 0.0001 0.55072100
2 1.537200000 B 2.83 0.0055 0.54232171
3 1.180588235 B 2.29 0.0241 0.51591227
4 0.000000000 B . . .
SCHOOL*REGION 1 1 -0.286434783 B -0.20 0.8455 1.46660342
1 2 -0.618628571 B -0.44 0.6620 1.41099883
1 3 -0.300588235 B -0.19 0.8486 1.57026346
1 4 0.000000000 B . . .
SCHOOL*REGION 2 1 0.000000000 B . . .
2 2 0.000000000 B . . .
2 3 0.000000000 B . . .
2 4 0.000000000 B . . .
NOTE: The X'X matrix has been found to be singular and a generalized inverse
was used to solve the normal equations. Estimates followed by the
letter 'B' are biased, and are not unique estimators of the parameters.
SCHOOL REGION N Obs N Mean Std Dev Minimum
--------------------------------------------------------------------------------
1 1 5 5 12.3240000 3.3527198 9.7800000
2 7 7 10.5985714 1.1317454 8.2800000
3 3 3 10.5600000 0.7362744 10.1200000
4 2 2 9.6800000 0.6788225 9.2000000
2 1 23 23 10.8204348 2.5061460 8.0300000
2 25 25 9.4272000 1.0978635 7.3900000
3 34 34 9.0705882 1.1911516 7.0800000
4 14 14 7.8900000 0.8332420 6.7000000
--------------------------------------------------------------------------------
OBS STAY AGE RISK CULTURE CHEST BEDS SCHOOL REGION CENSUS NURSES FACIL
23 9.78 52.3 5.0 17.6 95.9 270 1 1 240 198 57.1
25 9.20 52.2 4.0 17.5 71.1 298 1 4 244 236 57.1
26 8.28 49.5 3.9 12.0 113.1 546 1 2 413 436 57.1
44 10.12 51.7 5.6 14.9 79.1 362 1 3 313 264 54.3
46 10.16 54.2 4.6 8.4 51.5 831 1 4 581 629 74.3
47 19.56 59.9 6.5 17.2 113.7 306 2 1 273 172 51.4
74 10.05 52.0 4.5 36.7 87.5 184 1 1 144 151 68.6
90 11.41 50.4 5.8 23.8 73.0 424 1 3 359 335 45.7
100 10.15 51.9 6.2 16.4 59.2 568 1 3 452 371 62.9
112 17.94 56.2 5.9 26.4 91.8 835 1 1 791 407 62.9
OBS FITTED PRESS HAT EXTST RESID DFFITS COOKD
23 12.3240 -3.18000 0.20000 -1.76835 -2.54400 -0.88418 0.09578
25 9.6800 -0.96000 0.50000 -0.41618 -0.48000 -0.41618 0.02182
26 10.5986 -2.70500 0.14286 -1.55177 -2.31857 -0.63351 0.04950
44 10.5600 -0.66000 0.33333 -0.33029 -0.44000 -0.23355 0.00688
46 9.6800 0.96000 0.50000 0.41618 0.48000 0.41618 0.02182
47 10.8204 9.13682 0.04348 6.48789 8.73957 1.38322 0.17189
74 12.3240 -2.84250 0.20000 -1.57592 -2.27400 -0.78796 0.07653
90 10.5600 1.27500 0.33333 0.63897 0.85000 0.45182 0.02566
100 10.5600 -0.61500 0.33333 -0.30774 -0.41000 -0.21761 0.00597
112 12.3240 7.02000 0.20000 4.15303 5.61600 2.07652 0.46676
Here I regress STAY on SCHOOL, REGION and FACILITIES. I begin by putting in all the possible interaction effects.
options pagesize=60 linesize=80; data scenic; infile 'scenic.dat' firstobs=2; input Stay Age Risk Culture Chest Beds School Region Census Nurses Facil; proc glm data=scenic; class school region ; model Stay = School | Region | Facil / SS1 SS2 SS3 ; output out=scout P=Fitted PRESS=PRESS H=HAT RSTUDENT =EXTST R=RESID DFFITS=DFFITS COOKD=COOKD; run ; proc print data=scout; proc glm data=scenic; class school region ; model Stay = School | Region Facil / SS1 SS2 SS3 ; run ;EDITED SAS OUTPUT (Complete output)
Dependent Variable: STAY
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 15 173.90201568 11.59346771 4.78 0.0001
Error 97 235.30836485 2.42585943
Corrected Total 112 409.21038053
R-Square C.V. Root MSE STAY Mean
0.424970 16.14289 1.5575171 9.6483186
Source DF Type I SS Mean Square F Value Pr > F
SCHOOL 1 36.08413010 36.08413010 14.87 0.0002
REGION 3 95.36410217 31.78803406 13.10 0.0001
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9682
FACIL 1 9.52496125 9.52496125 3.93 0.0504
FACIL*SCHOOL 1 1.32686372 1.32686372 0.55 0.4613
FACIL*REGION 3 21.28634656 7.09544885 2.92 0.0377
FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683
Source DF Type II SS Mean Square F Value Pr > F
SCHOOL 1 4.73069924 4.73069924 1.95 0.1658
REGION 3 8.16560072 2.72186691 1.12 0.3441
SCHOOL*REGION 3 7.04260265 2.34753422 0.97 0.4113
FACIL 1 9.52496125 9.52496125 3.93 0.0504
FACIL*SCHOOL 1 3.76491803 3.76491803 1.55 0.2158
FACIL*REGION 3 21.28634656 7.09544885 2.92 0.0377
FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683
Source DF Type III SS Mean Square F Value Pr > F
SCHOOL 1 2.34679006 2.34679006 0.97 0.3278
REGION 3 2.46002453 0.82000818 0.34 0.7979
SCHOOL*REGION 3 7.04260265 2.34753422 0.97 0.4113
FACIL 1 0.70390965 0.70390965 0.29 0.5913
FACIL*SCHOOL 1 1.50831325 1.50831325 0.62 0.4323
FACIL*REGION 3 1.92051520 0.64017173 0.26 0.8513
FACIL*SCHOOL*REGION 3 9.69825722 3.23275241 1.33 0.2683
OBS STAY AGE RISK CULTURE CHEST BEDS SCHOOL REGION CENSUS NURSES FACIL
25 9.20 52.2 4.0 17.5 71.1 298 1 4 244 236 57.1
46 10.16 54.2 4.6 8.4 51.5 831 1 4 581 629 74.3
47 19.56 59.9 6.5 17.2 113.7 306 2 1 273 172 51.4
OBS FITTED PRESS HAT EXTST RESID DFFITS COOKD
25 9.2000 . 1.00000 . -0.00000 . .
46 10.1600 . 1.00000 . 0.00000 . .
47 11.8970 8.29701 0.07641 5.96177 7.66301 1.71483 0.13553
COMMENTS
The slopes and intercepts have been decomposed in the same way that the means in a 2 way layout are decomposed into main effects and interactions. Normally we might begin by looking for any interaction of facility with anything by comparing the full model to a model with no interaction effects. This is what the second proc glm run does. More of the output follows.
Dependent Variable: STAY
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 141.59054818 17.69881852 6.88 0.0001
Error 104 267.61983235 2.57326762
Corrected Total 112 409.21038053
R-Square C.V. Root MSE STAY Mean
0.346009 16.62612 1.6041408 9.6483186
Source DF Type I SS Mean Square F Value Pr > F
SCHOOL 1 36.08413010 36.08413010 14.02 0.0003
REGION 3 95.36410217 31.78803406 12.35 0.0001
SCHOOL*REGION 3 0.61735466 0.20578489 0.08 0.9708
FACIL 1 9.52496125 9.52496125 3.70 0.0571
Source DF Type II SS Mean Square F Value Pr > F
SCHOOL 1 8.66242211 8.66242211 3.37 0.0694
REGION 3 82.48995156 27.49665052 10.69 0.0001
SCHOOL*REGION 3 0.48049197 0.16016399 0.06 0.9796
FACIL 1 9.52496125 9.52496125 3.70 0.0571
Source DF Type III SS Mean Square F Value Pr > F
SCHOOL 1 8.45264294 8.45264294 3.28 0.0728
REGION 3 42.65719728 14.21906576 5.53 0.0015
SCHOOL*REGION 3 0.48049197 0.16016399 0.06 0.9796
FACIL 1 9.52496125 9.52496125 3.70 0.0571