Example 57.2: Regression Parameter Estimates
In this example, PROC REG computes regression
parameter estimates for the Fitness data.
(See Example 57.1 to create the Fitness data set.)
The parameter estimates are output to a
data set and used as scoring coefficients.
For the first part of this example, PROC SCORE is used to score
the Fitness data, which are the same data used in the
regression.
In the second part of this example, PROC SCORE
is used to score a new data set, Fitness2.
For PROC SCORE, the TYPE= specification is PARMS, and the
names of the score variables are found in the variable
_MODEL_, which gets its values from the model label.
The following code produces Output 57.2.1
through Output 57.2.3:
proc reg data=Fitness outest=RegOut;
OxyHat: model Oxygen=Age Weight RunTime RunPulse RestPulse;
title 'REGRESSION SCORING EXAMPLE';
run;
proc print data=RegOut;
title2 'OUTEST= Data Set from PROC REG';
run;
proc score data=Fitness score=RegOut out=RScoreP type=parms;
var Age Weight RunTime RunPulse RestPulse;
run;
proc print data=RScoreP;
title2 'Predicted Scores for Regression';
run;
proc score data=Fitness score=RegOut out=RScoreR type=parms;
var Oxygen Age Weight RunTime RunPulse RestPulse;
run;
proc print data=RScoreR;
title2 'Negative Residual Scores for Regression';
run;
Output 57.2.1 shows the PROC REG output.
The column labeled "Parameter Estimates"
lists the parameter estimates.
These estimates are output to the RegOut data set.
Output 57.2.1: Creating an OUTEST= Data Set with PROC REG
REGRESSION SCORING EXAMPLE |
The REG Procedure |
Model: OXYHAT |
Dependent Variable: Oxygen |
Analysis of Variance |
Source |
DF |
Sum of Squares |
Mean Square |
F Value |
Pr > F |
Model |
5 |
509.62201 |
101.92440 |
15.80 |
0.0021 |
Error |
6 |
38.70060 |
6.45010 |
|
|
Corrected Total |
11 |
548.32261 |
|
|
|
Root MSE |
2.53970 |
R-Square |
0.9294 |
Dependent Mean |
48.38942 |
Adj R-Sq |
0.8706 |
Coeff Var |
5.24847 |
|
|
Parameter Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
t Value |
Pr > |t| |
Intercept |
1 |
151.91550 |
31.04738 |
4.89 |
0.0027 |
Age |
1 |
-0.63045 |
0.42503 |
-1.48 |
0.1885 |
Weight |
1 |
-0.10586 |
0.11869 |
-0.89 |
0.4068 |
RunTime |
1 |
-1.75698 |
0.93844 |
-1.87 |
0.1103 |
RunPulse |
1 |
-0.22891 |
0.12169 |
-1.88 |
0.1090 |
RestPulse |
1 |
-0.17910 |
0.13005 |
-1.38 |
0.2176 |
|
Output 57.2.2 lists the RegOut data set.
Notice that _TYPE_='PARMS' and _MODEL_='OXYHAT', which
are from
the label in the MODEL statement in PROC REG.
Output 57.2.2: OUTEST= Data Set from PROC REG Reproduced with PROC PRINT
REGRESSION SCORING EXAMPLE |
OUTEST= Data Set from PROC REG |
Obs |
_MODEL_ |
_TYPE_ |
_DEPVAR_ |
_RMSE_ |
Intercept |
Age |
Weight |
RunTime |
RunPulse |
RestPulse |
Oxygen |
1 |
OXYHAT |
PARMS |
Oxygen |
2.53970 |
151.916 |
-0.63045 |
-0.10586 |
-1.75698 |
-0.22891 |
-0.17910 |
-1 |
|
Output 57.2.3 lists the data sets created by PROC SCORE.
Since the SCORE= data set does not contain observations
with _TYPE_='MEAN' or _TYPE_='STD', the data
in the Fitness data set are not standardized before scoring.
The SCORE= data set contains the variable INTERCEPT,
so this intercept value is used in computing the score.
To produce the RScoreP data set, the VAR statement in PROC SCORE includes
only the independent variables from the model in PROC REG.
As a result, the OxyHat variable contains predicted values.
To produce the RScoreR data set, the VAR statement in PROC
SCORE includes both the dependent variables and the
independent variables from the model in PROC REG.
As a result, the OxyHat variable contains negative residuals (PREDICT-ACTUAL).
If the RESIDUAL option is specified, the variable OxyHat contains
positive residuals (ACTUAL-PREDICT).
If the PREDICT option is specified,
the OxyHat variable contains predicted values.
Output 57.2.3: Predicted and Residual Scores from the OUT= Data Set
Created by PROC SCORE and Reproduced Using PROC PRINT
REGRESSION SCORING EXAMPLE |
Predicted Scores for Regression |
Obs |
Age |
Weight |
Oxygen |
RunTime |
RestPulse |
RunPulse |
OXYHAT |
1 |
44 |
89.47 |
44.609 |
11.37 |
62 |
178 |
42.8771 |
2 |
40 |
75.07 |
45.313 |
10.07 |
62 |
185 |
47.6050 |
3 |
44 |
85.84 |
54.297 |
8.65 |
45 |
156 |
56.1211 |
4 |
42 |
68.15 |
59.571 |
8.17 |
40 |
166 |
58.7044 |
5 |
38 |
89.02 |
49.874 |
9.22 |
55 |
178 |
51.7386 |
6 |
47 |
77.45 |
44.811 |
11.63 |
58 |
176 |
42.9756 |
7 |
40 |
75.98 |
45.681 |
11.95 |
70 |
176 |
44.8329 |
8 |
43 |
81.19 |
49.091 |
10.85 |
64 |
162 |
48.6020 |
9 |
44 |
81.42 |
39.442 |
13.08 |
63 |
174 |
41.4613 |
10 |
38 |
81.87 |
60.055 |
8.63 |
48 |
170 |
56.6171 |
11 |
44 |
73.03 |
50.541 |
10.13 |
45 |
168 |
52.1299 |
12 |
45 |
87.66 |
37.388 |
14.03 |
56 |
186 |
37.0080 |
REGRESSION SCORING EXAMPLE |
Negative Residual Scores for Regression |
Obs |
Age |
Weight |
Oxygen |
RunTime |
RestPulse |
RunPulse |
OXYHAT |
1 |
44 |
89.47 |
44.609 |
11.37 |
62 |
178 |
-1.73195 |
2 |
40 |
75.07 |
45.313 |
10.07 |
62 |
185 |
2.29197 |
3 |
44 |
85.84 |
54.297 |
8.65 |
45 |
156 |
1.82407 |
4 |
42 |
68.15 |
59.571 |
8.17 |
40 |
166 |
-0.86657 |
5 |
38 |
89.02 |
49.874 |
9.22 |
55 |
178 |
1.86460 |
6 |
47 |
77.45 |
44.811 |
11.63 |
58 |
176 |
-1.83542 |
7 |
40 |
75.98 |
45.681 |
11.95 |
70 |
176 |
-0.84811 |
8 |
43 |
81.19 |
49.091 |
10.85 |
64 |
162 |
-0.48897 |
9 |
44 |
81.42 |
39.442 |
13.08 |
63 |
174 |
2.01935 |
10 |
38 |
81.87 |
60.055 |
8.63 |
48 |
170 |
-3.43787 |
11 |
44 |
73.03 |
50.541 |
10.13 |
45 |
168 |
1.58892 |
12 |
45 |
87.66 |
37.388 |
14.03 |
56 |
186 |
-0.38002 |
|
The second part of this example uses the
parameter estimates to score a new data set.
The following code produces Output 57.2.4
and Output 57.2.5:
/* The FITNESS2 data set contains observations 13-16 from */
/* the FITNESS data set used in EXAMPLE 2 in the PROC REG */
/* chapter. */
data Fitness2;
input Age Weight Oxygen RunTime RestPulse RunPulse;
datalines;
45 66.45 44.754 11.12 51 176
47 79.15 47.273 10.60 47 162
54 83.12 51.855 10.33 50 166
49 81.42 49.156 8.95 44 180
;
proc print data=Fitness2;
title 'REGRESSION SCORING EXAMPLE';
title2 'New Raw Data Set to be Scored';
run;
proc score data=Fitness2 score=RegOut out=NewPred type=parms
nostd predict;
var Oxygen Age Weight RunTime RunPulse RestPulse;
run;
proc print data=NewPred;
title2 'Predicted Scores for Regression';
title3 'for Additional Data from FITNESS2';
run;
Output 57.2.4 lists the Fitness2 data set.
Output 57.2.4: Listing of the Fitness2 Data Set
REGRESSION SCORING EXAMPLE |
New Raw Data Set to be Scored |
Obs |
Age |
Weight |
Oxygen |
RunTime |
RestPulse |
RunPulse |
1 |
45 |
66.45 |
44.754 |
11.12 |
51 |
176 |
2 |
47 |
79.15 |
47.273 |
10.60 |
47 |
162 |
3 |
54 |
83.12 |
51.855 |
10.33 |
50 |
166 |
4 |
49 |
81.42 |
49.156 |
8.95 |
44 |
180 |
|
PROC SCORE scores the Fitness2 data set
using the parameter estimates in the RegOut data set.
These parameter estimates result from
fitting a regression equation to the Fitness data set.
The NOSTD option is specified, so the raw
data are not standardized before scoring.
(However, the NOSTD option is not necessary here.
The SCORE= data set does not contain observations
with _TYPE_='MEAN' or _TYPE_='STD',
so standardization is not performed.)
The VAR statement contains the dependent variables
and the independent variables used in PROC REG.
In addition, the PREDICT option is specified.
This combination gives predicted values for the new score variable.
The name of the new score variable is OxyHat, from
the value of the _MODEL_ variable in the SCORE= data set.
Output 57.2.5 shows the data set produced by PROC SCORE.
Output 57.2.5: Predicted Scores from the OUT= Data Set Created
by PROC SCORE and Reproduced Using PROC PRINT
REGRESSION SCORING EXAMPLE |
Predicted Scores for Regression |
for Additional Data from FITNESS2 |
Obs |
Age |
Weight |
Oxygen |
RunTime |
RestPulse |
RunPulse |
OXYHAT |
1 |
45 |
66.45 |
44.754 |
11.12 |
51 |
176 |
47.5507 |
2 |
47 |
79.15 |
47.273 |
10.60 |
47 |
162 |
49.7802 |
3 |
54 |
83.12 |
51.855 |
10.33 |
50 |
166 |
43.9682 |
4 |
49 |
81.42 |
49.156 |
8.95 |
44 |
180 |
47.5949 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.