STAT 330 Lecture 27
Reading for Today's Lecture: 11.1, 11.2.
Goals of Today's Lecture:
Today's notes
Two way layout: the ANOVA table
| Sum of | Mean | ||||
| Source | df | Squares | Square | F | P |
| | I-1 | | SS/df | | |
|
| J-1 | | SS/df |
| |
|
| (I-1)(J-1) | | SS/df | | |
|
| IJ(K-1) | | SS/df | ||
| Total | n-1 | |
There are 3 F-statistics for each of which P values come from F tables with degrees of freedom which are recorded in the degrees of freedom column:
Example: the variable X is plaster hardness. The factors are SAND content (with levels 0, 15 and 30%) and FIBRE content (with levels 0, 25 and 50%). We have 2 replicates so I=3, J=3 and K=2.
SAS analysis
The data consist of casting hardnesses for 18 samples prepared under 3 levels of sand added and 3 levels of carbon fibre added. See Q 15 in Chapter 11. I use proc anova to test the hypotheses of no effect of either sand content or fibre content after first testing for interactions.
I ran the following SAS code:
options pagesize=60 linesize=80; data plaster; infile 'plaster.dat'; input sand fibre hardness strength; proc anova data=plaster; class sand fibre; model hardness = sand|fibre; means sand fibre / tukey cldiff ; run;
The line labelled model says that I am interested in the effects of sand, fibre and interactions between the two. The line class sand fibre is required so that SAS knows which variables define the levels of the factors.
The output from proc anova begins with a print out of information about the variables SAND and FIBRE: how many levels there are and what the levels are called.
The SAS System 1
14:05 Tuesday, November 14, 1995
Analysis of Variance Procedure
Class Level Information
Class Levels Values
SAND 3 0 15 30
FIBRE 3 0 25 50
Number of observations in data set = 18
Next SAS produces the ANOVA table. You should notice that it first prints out a table with three lines: MODEL, ERROR and TOTAL. The line labelled MODEL is the sum of the three lines "Factor 1", "Factor 2", and "Interactions" in the table I have shown above.
The SAS System 2
14:05 Tuesday, November 14, 1995
Analysis of Variance Procedure
Dependent Variable: HARDNESS
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 8 202.77777778 25.34722222 3.10 0.0557
Error 9 73.50000000 8.16666667
Corrected Total 17 276.27777778
Next it prints some summary statistics, including the Root MSE which is
R-Square C.V. Root MSE HARDNESS Mean
0.733963 4.105290 2.8577380 69.611111
Finally it breaks the model line down into the three lines in my table
above. You can check that these lines add up to the model line above.
Source DF Anova SS Mean Square F Value Pr > F SAND 2 106.77777778 53.38888889 6.54 0.0176 FIBRE 2 87.11111111 43.55555556 5.33 0.0297 SAND*FIBRE 4 8.88888889 2.22222222 0.27 0.8887The conclusions are that both sand and fibre have an effect on hardness but that there is little evidence of an interaction between the two factors. (These are based on the column of P values, 0.0176, 0.0297 and 0.8887.)
Finally here are the results of the line which asked for Tukey confidence intervals.
The SAS System 3
14:05 Tuesday, November 14, 1995
Analysis of Variance Procedure
Tukey's Studentized Range (HSD) Test for variable: HARDNESS
NOTE: This test controls the type I experimentwise error rate.
Alpha= 0.05 Confidence= 0.95 df= 9 MSE= 8.166667
Critical Value of Studentized Range= 3.948
Minimum Significant Difference= 4.6066
Comparisons significant at the 0.05 level are indicated by '***'.
Simultaneous Simultaneous
Lower Difference Upper
SAND Confidence Between Confidence
Comparison Limit Means Limit
30 - 15 -2.773 1.833 6.440
30 - 0 1.227 5.833 10.440 ***
15 - 30 -6.440 -1.833 2.773
15 - 0 -0.607 4.000 8.607
0 - 30 -10.440 -5.833 -1.227 ***
0 - 15 -8.607 -4.000 0.607
The SAS System 4
14:05 Tuesday, November 14, 1995
Analysis of Variance Procedure
Tukey's Studentized Range (HSD) Test for variable: HARDNESS
NOTE: This test controls the type I experimentwise error rate.
Alpha= 0.05 Confidence= 0.95 df= 9 MSE= 8.166667
Critical Value of Studentized Range= 3.948
Minimum Significant Difference= 4.6066
Comparisons significant at the 0.05 level are indicated by '***'.
Simultaneous Simultaneous
Lower Difference Upper
FIBRE Confidence Between Confidence
Comparison Limit Means Limit
50 - 25 -4.607 0.000 4.607
50 - 0 0.060 4.667 9.273 ***
25 - 50 -4.607 0.000 4.607
25 - 0 0.060 4.667 9.273 ***
0 - 50 -9.273 -4.667 -0.060 ***
0 - 25 -9.273 -4.667 -0.060 ***
The Tukey procedures show a clear difference between the 0% fibre and the other two levels with 0 FIBRE clearly lowering hardness. There is no clear difference between the 25% and 50% FIBRE levels in terms of effect on hardness. The high level of sand clearly differs from the low level but the intermediate level is not clearly distinguished from the other two.
Thus, the SAS output shows that the ANOVA table is as follows:
| Sum of | Mean | ||||
| Source | df | Squares | Square | F | P |
| | 2 | 106.8 | 53.4 | 6.54 | 0.018 |
|
| 2 | 87.1 | 43.6 | 5.33 | 0.030 |
|
| 4 | 8.89 | 2.22 | 0.27 | 0.89 |
|
| 9 | 73.5 | 8.17 | ||
| 17 | 276.28 |
Conclusions:
NEXT: multiple comparisons and Tukey confidence intervals.
| SAND: 95% CI | ||
| 0 | 15 | 30 |
|
| ||
|
| ||
Two way layouts without replicates (K=1)
When K=1 we do not have enough data to estimate the
s. So
we simplify the model to
Notice that we have droped the
s and the subscript k.
The estimates for the parameters are now
and so on; these are the same as for K>1. The fitted residuals are
The ANOVA table simplifies to
| Sum of | Mean | ||||
| Source | df | Squares | Square | F | P |
| | I-1 | | SS/df | | |
|
| J-1 | | SS/df |
| |
|
| (I-1)(J-1) | | SS/df | ||
| Total | n-1 | |
This ANOVA table can be used to test the hypotheses of no main effects for
Factor 1 and no main effects for Factor 2, that is, the hypotheses
and
.