next up previous

STAT 350: Lecture 11 Example

Multiple Regression

In the data set below the hardness of plaster is measured for each of 9 combinations of sand content and fibre content. Sand content was set at one of 3 levels as was fibre content and all possible combinations tried on two batches of plaster.

Here is an excerpt of the data:

Sand  Fibre  Hardness  Strength 
   0    0       61       34
   0    0       63       16
  15    0       67       36
  15    0       69       19
  30    0       65       28
     ...
The complete data set is here.

I fit submodels of the following "Full" model:

displaymath21

I adopt the idea that the interaction term is probably negligible unless each of S and F have some effect and that quadratic terms will probably not be present unless linear terms are present. This limits the set of potential reasonable models. I fit each of them and report the error sum of squares in the following table

Model for tex2html_wrap_inline27 Error Sum of Squares Error df
Full 81.264 12
tex2html_wrap_inline29 82.389 13
tex2html_wrap_inline31 104.167 14
tex2html_wrap_inline33 169.500 15
tex2html_wrap_inline35 174.194 16
tex2html_wrap_inline37 87.083 14
tex2html_wrap_inline39 189.167 15
tex2html_wrap_inline41 210.944 16
tex2html_wrap_inline43 108.861 15

I begin by asking whether the 2nd degree polynomial terms, that is, those involving tex2html_wrap_inline45 and tex2html_wrap_inline47 need be included. To do so I compare the top line with the model containg only tex2html_wrap_inline43 . The extra SS is 108.861-81.264 on 3 degrees of freedom which gives a mean square of (108.861-81.264)/3= 9.199. The MSE is 81.264/12 = 6.772. This gives an F-statistic of 9.199/6.772=1.358 on 3 numerator and 12 denominator degrees of freedom. This gives a P-value of 0.30 which is not sigmificant. We would then delete the quadratic terms and consider the coefficients of S and F. We have a choice between pretending that the last line in the table is now the "Full" model and forming the F-statistics (210.944-108.861)/(108.861/15) = 14.066 and (174.194-108.861)/(108.861/15) = 9.002. The first is for testing tex2html_wrap_inline61 and the second for tex2html_wrap_inline63 . Each is on 1 and 15 degrees of freedom. The corresponding P-values are 0.002 and 0.009. This are both highly significant and we conclude that both Sand content and Fibre content have an impact on hardness and that there is little reason to look for non-linear impacts of the the two factors.

An alternative starting point would be to check first to see if the interaction terms could be eliminated, that is, test the hypothesis that tex2html_wrap_inline67 . This hypothesis can be tested either using the F statistic [(82.389-81.264)/1}/[12.264/12] = 0.166 or using the t-statistic which is tex2html_wrap_inline73 and which SAS calculates to be -0.41. Note that tex2html_wrap_inline75 to within round-off error. Algebraically tex2html_wrap_inline77 . Note, too, that the t test can be made one-sided while the F-test cannot.

Here is SAS CODE and output.

next up previous




Richard Lockhart
Mon Mar 3 11:15:17 PST 1997