STAT 350: Lecture 11 Example
Multiple Regression
In the data set below the hardness of plaster is measured for each of 9 combinations of sand content and fibre content. Sand content was set at one of 3 levels as was fibre content and all possible combinations tried on two batches of plaster.
Here is an excerpt of the data:
Sand Fibre Hardness Strength
0 0 61 34
0 0 63 16
15 0 67 36
15 0 69 19
30 0 65 28
...
The complete data set is here.
I fit submodels of the following "Full" model:
I adopt the idea that the interaction term is probably negligible unless each of S and F have some effect and that quadratic terms will probably not be present unless linear terms are present. This limits the set of potential reasonable models. I fit each of them and report the error sum of squares in the following table
|
Model for | Error Sum of Squares | Error df |
| Full | 81.264 | 12 |
|
| 82.389 | 13 |
|
| 104.167 | 14 |
|
| 169.500 | 15 |
|
| 174.194 | 16 |
|
| 87.083 | 14 |
|
| 189.167 | 15 |
|
| 210.944 | 16 |
|
| 108.861 | 15 |
I begin by asking whether the 2nd degree polynomial terms, that is,
those involving
and
need be included.
To do so I compare the top line with the model containg only
. The extra SS is 108.861-81.264
on 3 degrees of freedom which gives a mean square of (108.861-81.264)/3=
9.199. The MSE is 81.264/12 = 6.772. This gives an F-statistic
of 9.199/6.772=1.358 on 3 numerator and 12 denominator degrees of freedom.
This gives a P-value of 0.30 which is not sigmificant. We would then
delete the quadratic terms and consider the coefficients of S and F.
We have a choice between pretending that the last line in the table is
now the "Full" model and forming the F-statistics
(210.944-108.861)/(108.861/15) = 14.066 and (174.194-108.861)/(108.861/15)
= 9.002. The first is for testing
and the second for
. Each is on 1 and 15 degrees of freedom. The corresponding
P-values are 0.002 and 0.009. This are both highly significant and
we conclude that both Sand content and Fibre content have an impact
on hardness and that there is little reason to look for non-linear
impacts of the the two factors.
An alternative starting point would be to check first to see if the
interaction terms could be eliminated, that is, test the hypothesis
that
. This hypothesis can be tested either using the F
statistic [(82.389-81.264)/1}/[12.264/12] = 0.166 or using the
t-statistic which is
and which SAS calculates to be
-0.41. Note that
to within round-off error.
Algebraically
. Note, too, that the t test can be made
one-sided while the F-test cannot.