SAS example: Multiple Regression
The data consist of casting hardnesses for 18 samples prepared under 3 levels of sand added and 3 levels of carbon fibre added. See Q 15 in Chapter 11. I use proc glm to regress hardness on sand content and fibre content but now treat them as continuous variables.
I ran the following SAS code:
options pagesize=60 linesize=80; data plaster; infile 'plaster.dat'; input sand fibre hardness strength; proc glm data=plaster; model hardness = sand fibre; output out=plasfit p=yhat r=resid ; proc univariate data=plasfit plot normal; var resid; proc plot; plot resid*sand; plot resid*fibre; plot resid*yhat; run;
The line labelled model says that I am interested in the effects of sand and fibre; the lack of the class statment makes glm do multiple regression
The abridged output from proc glm is:
General Linear Models Procedure
Number of observations in data set = 18
Dependent Variable: HARDNESS
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 2 167.41666667 83.70833333 11.53 0.0009
Error 15 108.86111111 7.25740741
Corrected Total 17 276.27777778
R-Square C.V. Root MSE HARDNESS Mean
0.605972 3.870011 2.6939576 69.611111
Source DF Type I SS Mean Square F Value Pr > F
SAND 1 102.08333333 102.08333333 14.07 0.0019
FIBRE 1 65.33333333 65.33333333 9.00 0.0090
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT 64.36111111 50.68 0.0001 1.26994378
SAND 0.19444444 3.75 0.0019 0.05184524
FIBRE 0.09333333 3.00 0.0090 0.03110714
The conclusions are that both sand and fibre have an effect on hardness. The last table permits confidence intervals for the slopes.
Diagnostic statistics and plots:
Univariate Procedure
Variable=RESID
Moments
N 18 Sum Wgts 18
Mean 0 Sum 0
Std Dev 2.530533 Variance 6.403595
Skewness -0.1431 Kurtosis -0.29863
USS 108.8611 CSS 108.8611
CV . Std Mean 0.596452
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 18 Num > 0 7
M(Sign) -2 Pr>=|M| 0.4807
Sgn Rank 0.5 Pr>=|S| 0.9915
W:Normal 0.976631 Pr<W 0.8888
Quantiles(Def=5)
100% Max 4.388889 99% 4.388889
75% Q3 2.055556 95% 4.388889
50% Med -0.40278 90% 3.805556
25% Q1 -1.36111 10% -3.36111
0% Min -5.19444 5% -5.19444
1% -5.19444
Range 9.583333
Q3-Q1 3.416667
Mode -0.86111
Extremes
Lowest Obs Highest Obs
-5.19444( 5) 2.055556( 16)
-3.36111( 1) 2.305556( 7)
-2.94444( 15) 2.305556( 8)
-2.02778( 13) 3.805556( 6)
-1.36111( 2) 4.388889( 10)
Stem Leaf # Boxplot
4 4 1 |
2 1338 4 +-----+
0 57 2 | + |
-0 4996530 7 *-----*
-2 490 3 |
-4 2 1 |
----+----+----+----+
Normal Probability Plot
5+ ++*+++++
| **++++*++
| ++++**++
| *+*++** **
| ++*+*++*
-5+ +++++*++
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
Plot of RESID*SAND. Legend: A = 1 obs, B = 2 obs, etc.
RESID |
|
6 +
|
|
|
|
|
| A
4 +
| A
|
|
|
|
|B
2 + A
| A
| A
|
|
|
|
0 +A
| A
| A A
| B
|
|A
|
-2 +A
|
|
| A
|
|A
|
-4 +
|
|
|
| A
|
|
-6 +
|
-+-----------------------------------+-----------------------------------+
0 15 30
SAND
Plot of RESID*FIBRE. Legend: A = 1 obs, B = 2 obs, etc.
RESID |
|
6 +
|
|
|
|
|
| A
4 +
|A
|
|
|
|
| B
2 + A
|A
| A
|
|
|
|
0 + A
|A
| B
| B
|
|A
|
-2 + A
|
|
| A
|
|A
|
-4 +
|
|
|
|A
|
|
-6 +
|
-+-----------------------------------+-----------------------------------+
0 25 50
FIBRE
Plot of RESID*YHAT. Legend: A = 1 obs, B = 2 obs, etc.
RESID |
|
6 +
|
|
|
|
|
| A
4 +
| A
|
|
|
|
| B
2 + A
| A
| A
|
|
|
|
0 + A
| A
| A A
| B
|
| A
|
-2 + A
|
|
| A
|
| A
|
-4 +
|
|
|
| A
|
|
-6 +
|
-+-----------+-----------+-----------+-----------+-----------+-----------+
64 66 68 70 72 74 76
YHAT
The diagnostic plots seem fine to me.
The model can be run with an interaction term:
options pagesize=60 linesize=80; data plaster; infile 'plaster.dat'; input sand fibre hardness strength; proc anova data=plaster; model hardness = sand|fibre; run;which produces
General Linear Models Procedure
Dependent Variable: HARDNESS
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 3 168.54166667 56.18055556 7.30 0.0035
Error 14 107.73611111 7.69543651
Corrected Total 17 276.27777778
R-Square C.V. Root MSE HARDNESS Mean
0.610044 3.985089 2.7740650 69.611111
Source DF Type I SS Mean Square F Value Pr > F
SAND 1 102.08333333 102.08333333 13.27 0.0027
FIBRE 1 65.33333333 65.33333333 8.49 0.0113
SAND*FIBRE 1 1.12500000 1.12500000 0.15 0.7079
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT 63.98611111 39.14 0.0001 1.63463347
SAND 0.21944444 2.60 0.0210 0.08441211
FIBRE 0.10833333 2.14 0.0505 0.05064727
SAND*FIBRE -0.00100000 -0.38 0.7079 0.00261541
There is no sign of a need for an interaction term so the original model
seems to be reasonable. Notice that the resulting model with only 3
parameters is more parsimonious than the model for the two way layout which
has 5 parameters (or 9 with an interaction term). The model asserts that
hardness actually increases linearly with sand content and also with fibre
content.