SAS example: Simple Linear Regression
The data consist of 14 pairs of measurements on the independent variable
Burner Area Liberation Rate (in million BTU per hr-ft
) and the dependent
variable Nitrogen Oxides (NO
) Emission Rate (in parts per million).
See Q 9 in Chapter 12. I use proc glm to fit a simple linear
regression model to assess the effect of Burner Area on NO
emissions.
I ran the following SAS code:
options pagesize=60 linesize=80; data nox; infile 'ch12q9.dat'; input area emission ; proc glm data=nox; model emission = area; output out=noxfit p=yhat r=resid ; proc univariate data=noxfit plot normal; var resid; proc plot; plot resid*area; plot resid*yhat; run;
The line labelled model says that I am interested in the effects of area (my shorthand name for ``Burner Area Liberation Rate'') on emissions.
The output from proc glm is
The SAS System 1
10:00 Monday, November 20, 1995
General Linear Models Procedure
Number of observations in data set = 14
The SAS System 2
10:00 Monday, November 20, 1995
General Linear Models Procedure
Dependent Variable: EMISSION
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 398030.26093 398030.26093 294.74 0.0001
Error 12 16205.45335 1350.45445
Corrected Total 13 414235.71429
R-Square C.V. Root MSE EMISSION Mean
0.960879 10.26905 36.748530 357.85714
Source DF Type I SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
Source DF Type III SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT -45.55190539 -1.79 0.0989 25.46779420
AREA 1.71143233 17.17 0.0001 0.09968772
The SAS System 3
10:00 Monday, November 20, 1995
General Linear Models Procedure
Number of observations in data set = 14
The SAS System 4
10:00 Monday, November 20, 1995
General Linear Models Procedure
Dependent Variable: EMISSION
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 398030.26093 398030.26093 294.74 0.0001
Error 12 16205.45335 1350.45445
Corrected Total 13 414235.71429
R-Square C.V. Root MSE EMISSION Mean
0.960879 10.26905 36.748530 357.85714
Source DF Type I SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
Source DF Type III SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT -45.55190539 -1.79 0.0989 25.46779420
AREA 1.71143233 17.17 0.0001 0.09968772
The SAS System 5
10:00 Monday, November 20, 1995
General Linear Models Procedure
Number of observations in data set = 14
The SAS System 6
10:00 Monday, November 20, 1995
General Linear Models Procedure
Dependent Variable: EMISSION
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 398030.26093 398030.26093 294.74 0.0001
Error 12 16205.45335 1350.45445
Corrected Total 13 414235.71429
R-Square C.V. Root MSE EMISSION Mean
0.960879 10.26905 36.748530 357.85714
Source DF Type I SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
Source DF Type III SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT -45.55190539 -1.79 0.0989 25.46779420
AREA 1.71143233 17.17 0.0001 0.09968772
The SAS System 7
10:00 Monday, November 20, 1995
General Linear Models Procedure
Number of observations in data set = 14
The SAS System 8
10:00 Monday, November 20, 1995
General Linear Models Procedure
Dependent Variable: EMISSION
Sum of Mean
Source DF Squares Square F Value Pr > F
Model 1 398030.26093 398030.26093 294.74 0.0001
Error 12 16205.45335 1350.45445
Corrected Total 13 414235.71429
R-Square C.V. Root MSE EMISSION Mean
0.960879 10.26905 36.748530 357.85714
Source DF Type I SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
Source DF Type III SS Mean Square F Value Pr > F
AREA 1 398030.26093 398030.26093 294.74 0.0001
T for H0: Pr > |T| Std Error of
Parameter Estimate Parameter=0 Estimate
INTERCEPT -45.55190539 -1.79 0.0989 25.46779420
AREA 1.71143233 17.17 0.0001 0.09968772
The SAS System 9
10:00 Monday, November 20, 1995
Univariate Procedure
Variable=RESID
Moments
N 14 Sum Wgts 14
Mean 0 Sum 0
Std Dev 35.30685 Variance 1246.573
Skewness -0.57524 Kurtosis 0.14238
USS 16205.45 CSS 16205.45
CV . Std Mean 9.436151
T:Mean=0 0 Pr>|T| 1.0000
Num ^= 0 14 Num > 0 7
M(Sign) 0 Pr>=|M| 1.0000
Sgn Rank 2.5 Pr>=|S| 0.9032
W:Normal 0.939768 Pr<W 0.3981
Quantiles(Def=5)
100% Max 47.69382 99% 47.69382
75% Q3 24.40867 95% 47.69382
50% Med 5.229961 90% 46.55059
25% Q1 -27.8778 10% -29.021
0% Min -77.8778 5% -77.8778
1% -77.8778
Range 125.5716
Q3-Q1 52.28647
Mode -77.8778
Extremes
Lowest Obs Highest Obs
-77.8778( 11) 23.26544( 6)
-29.021( 13) 24.40867( 1)
-28.3771( 2) 30.97898( 14)
-27.8778( 10) 46.55059( 12)
-21.1629( 5) 47.69382( 9)
Stem Leaf # Boxplot
4 78 2 |
2 341 3 +-----+
0 28 2 *--+--*
-0 71 2 | |
-2 9881 4 +-----+
-4 |
-6 8 1 |
----+----+----+----+
Multiply Stem.Leaf by 10**+1
The SAS System 10
10:00 Monday, November 20, 1995
Univariate Procedure
Variable=RESID
Normal Probability Plot
50+ *++++*
| *+*+*++
| +*+*++
-10+ ++*+*
| *++*+*+*
| +++++
-70+ +++++*
+----+----+----+----+----+----+----+----+----+----+
-2 -1 0 +1 +2
The conclusions are that AREA has a very significant and strong effect on emissions,
that the intercept of the linear regression might be 0 and that the estimated slope is
1.71
0.1. The diagnostic plots suggest no particularly obvious problems.
Plot of RESID*AREA. Legend: A = 1 obs, B = 2 obs, etc.
RESID |
|
60 +
|
|
|
| A A
|
40 +
|
|
| A
|
|A A
20 +
| A
|
| A
|
|
0 + A
|
|
|
|
| A
-20 + A
|
| A
| A A
|
|
-40 +
|
|
|
|
|
-60 +
|
|
|
|
| A
-80 +
|
-+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+-----+
100 125 150 175 200 225 250 275 300 325 350 375 400
AREA
Plot of RESID*YHAT. Legend: A = 1 obs, B = 2 obs, etc.
RESID |
|
60 +
|
|
|
| A A
|
40 +
|
|
| A
|
| A A
20 +
| A
|
| A
|
|
0 + A
|
|
|
|
| A
-20 + A
|
| A
| A A
|
|
-40 +
|
|
|
|
|
-60 +
|
|
|
|
| A
-80 +
|
-+-----------+-----------+-----------+-----------+-----------+-----------+
100 200 300 400 500 600 700
YHAT