Example 49.1: Stepwise Regression
Krall, Uthoff, and Harley (1975) analyzed data
from a study on multiple myeloma
in which researchers treated 65 patients with alkylating agents.
Of those patients, 48 died during the study and 17 survived.
In the data set Myeloma, the variable Time
represents the survival time in months from diagnosis. The
variable VStatus consists of two values, 0 and 1,
indicating whether the patient was alive or dead,
respectively, at the end of the study. If the value of
VStatus is 0, the corresponding value of Time is
censored.
The variables
thought to be related to survival are LogBUN
(log(BUN) at diagnosis), HGB (hemoglobin at diagnosis),
Platelet (platelets at diagnosis: 0=abnormal,
1=normal), Age (age at diagnosis in years),
LogWBC (log(WBC) at diagnosis), Frac (fractures
at diagnosis: 0=none, 1=present), LogPBM (log
percentage of plasma cells in bone marrow), Protein
(proteinuria at diagnosis), and SCalc (serum calcium
at diagnosis). Interest lies in identifying important
prognostic factors from these nine explanatory variables.
data Myeloma;
input Time VStatus LogBUN HGB Platelet Age LogWBC Frac
LogPBM Protein SCalc;
label Time='Survival Time'
VStatus='0=Alive 1=Dead';
datalines;
1.25 1 2.2175 9.4 1 67 3.6628 1 1.9542 12 10
1.25 1 1.9395 12.0 1 38 3.9868 1 1.9542 20 18
2.00 1 1.5185 9.8 1 81 3.8751 1 2.0000 2 15
2.00 1 1.7482 11.3 0 75 3.8062 1 1.2553 0 12
2.00 1 1.3010 5.1 0 57 3.7243 1 2.0000 3 9
3.00 1 1.5441 6.7 1 46 4.4757 0 1.9345 12 10
5.00 1 2.2355 10.1 1 50 4.9542 1 1.6628 4 9
5.00 1 1.6812 6.5 1 74 3.7324 0 1.7324 5 9
6.00 1 1.3617 9.0 1 77 3.5441 0 1.4624 1 8
6.00 1 2.1139 10.2 0 70 3.5441 1 1.3617 1 8
6.00 1 1.1139 9.7 1 60 3.5185 1 1.3979 0 10
6.00 1 1.4150 10.4 1 67 3.9294 1 1.6902 0 8
7.00 1 1.9777 9.5 1 48 3.3617 1 1.5682 5 10
7.00 1 1.0414 5.1 0 61 3.7324 1 2.0000 1 10
7.00 1 1.1761 11.4 1 53 3.7243 1 1.5185 1 13
9.00 1 1.7243 8.2 1 55 3.7993 1 1.7404 0 12
11.00 1 1.1139 14.0 1 61 3.8808 1 1.2788 0 10
11.00 1 1.2304 12.0 1 43 3.7709 1 1.1761 1 9
11.00 1 1.3010 13.2 1 65 3.7993 1 1.8195 1 10
11.00 1 1.5682 7.5 1 70 3.8865 0 1.6721 0 12
11.00 1 1.0792 9.6 1 51 3.5051 1 1.9031 0 9
13.00 1 0.7782 5.5 0 60 3.5798 1 1.3979 2 10
14.00 1 1.3979 14.6 1 66 3.7243 1 1.2553 2 10
15.00 1 1.6021 10.6 1 70 3.6902 1 1.4314 0 11
16.00 1 1.3424 9.0 1 48 3.9345 1 2.0000 0 10
16.00 1 1.3222 8.8 1 62 3.6990 1 0.6990 17 10
17.00 1 1.2304 10.0 1 53 3.8808 1 1.4472 4 9
17.00 1 1.5911 11.2 1 68 3.4314 0 1.6128 1 10
18.00 1 1.4472 7.5 1 65 3.5682 0 0.9031 7 8
19.00 1 1.0792 14.4 1 51 3.9191 1 2.0000 6 15
19.00 1 1.2553 7.5 0 60 3.7924 1 1.9294 5 9
24.00 1 1.3010 14.6 1 56 4.0899 1 0.4771 0 9
25.00 1 1.0000 12.4 1 67 3.8195 1 1.6435 0 10
26.00 1 1.2304 11.2 1 49 3.6021 1 2.0000 27 11
32.00 1 1.3222 10.6 1 46 3.6990 1 1.6335 1 9
35.00 1 1.1139 7.0 0 48 3.6532 1 1.1761 4 10
37.00 1 1.6021 11.0 1 63 3.9542 0 1.2041 7 9
41.00 1 1.0000 10.2 1 69 3.4771 1 1.4771 6 10
41.00 1 1.1461 5.0 1 70 3.5185 1 1.3424 0 9
51.00 1 1.5682 7.7 0 74 3.4150 1 1.0414 4 13
52.00 1 1.0000 10.1 1 60 3.8573 1 1.6532 4 10
54.00 1 1.2553 9.0 1 49 3.7243 1 1.6990 2 10
58.00 1 1.2041 12.1 1 42 3.6990 1 1.5798 22 10
66.00 1 1.4472 6.6 1 59 3.7853 1 1.8195 0 9
67.00 1 1.3222 12.8 1 52 3.6435 1 1.0414 1 10
88.00 1 1.1761 10.6 1 47 3.5563 0 1.7559 21 9
89.00 1 1.3222 14.0 1 63 3.6532 1 1.6232 1 9
92.00 1 1.4314 11.0 1 58 4.0755 1 1.4150 4 11
4.00 0 1.9542 10.2 1 59 4.0453 0 0.7782 12 10
4.00 0 1.9243 10.0 1 49 3.9590 0 1.6232 0 13
7.00 0 1.1139 12.4 1 48 3.7993 1 1.8573 0 10
7.00 0 1.5315 10.2 1 81 3.5911 0 1.8808 0 11
8.00 0 1.0792 9.9 1 57 3.8325 1 1.6532 0 8
12.00 0 1.1461 11.6 1 46 3.6435 0 1.1461 0 7
11.00 0 1.6128 14.0 1 60 3.7324 1 1.8451 3 9
12.00 0 1.3979 8.8 1 66 3.8388 1 1.3617 0 9
13.00 0 1.6628 4.9 0 71 3.6435 0 1.7924 0 9
16.00 0 1.1461 13.0 1 55 3.8573 0 0.9031 0 9
19.00 0 1.3222 13.0 1 59 3.7709 1 2.0000 1 10
19.00 0 1.3222 10.8 1 69 3.8808 1 1.5185 0 10
28.00 0 1.2304 7.3 1 82 3.7482 1 1.6721 0 9
41.00 0 1.7559 12.8 1 72 3.7243 1 1.4472 1 9
53.00 0 1.1139 12.0 1 66 3.6128 1 2.0000 1 11
57.00 0 1.2553 12.5 1 66 3.9685 0 1.9542 0 11
77.00 0 1.0792 14.0 1 60 3.6812 0 0.9542 0 12
;
The stepwise selection process consists of a series
of alternating
step-up and step-down phases. The former adds
variables to the model, while the latter removes
variables from the model.
Stepwise regression analysis is requested by
specifying the SELECTION=STEPWISE
option in the MODEL statement.
The option SLENTRY=0.25
specifies that a variable has to be
significant at the 0.25 level before it can be entered into the model,
while the option SLSTAY=0.15
specifies that
a variable in the model has to be significant at the 0.15 level for it to
remain in the model.
The DETAILS option
requests detailed
results for the variable selection process.
proc phreg data=Myeloma;
model Time*VStatus(0)=LogBUN HGB Platelet Age LogWBC
Frac LogPBM Protein SCalc
/ selection=stepwise slentry=0.25
slstay=0.15 details;
run;
Results of the stepwise regression analysis are displayed in
Output 49.1.1 through Output 49.1.7.
Output 49.1.1: Individual Score Test Results for all Variables
Model Information |
Data Set |
WORK.MYELOMA |
|
Dependent Variable |
Time |
Survival Time |
Censoring Variable |
VStatus |
0=Alive 1=Dead |
Censoring Value(s) |
0 |
|
Ties Handling |
BRESLOW |
|
Summary of the Number of Event and Censored Values |
Total |
Event |
Censored |
Percent Censored |
65 |
48 |
17 |
26.15 |
Analysis of Variables Not in the Model |
Variable |
Score Chi-Square |
Pr > ChiSq |
LogBUN |
8.5164 |
0.0035 |
HGB |
5.0664 |
0.0244 |
Platelet |
3.1816 |
0.0745 |
Age |
0.0183 |
0.8924 |
LogWBC |
0.5658 |
0.4519 |
Frac |
0.9151 |
0.3388 |
LogPBM |
0.5846 |
0.4445 |
Protein |
0.1466 |
0.7018 |
SCalc |
1.1109 |
0.2919 |
Residual Chi-Square Test |
Chi-Square |
DF |
Pr > ChiSq |
18.4550 |
9 |
0.0302 |
|
Output 49.1.2: First Model in the Stepwise Selection Process
Step 1. Variable LogBUN is entered. The model contains the following explanatory variables: |
Convergence Status |
Convergence criterion (GCONV=1E-8) satisfied. |
Model Fit Statistics |
Criterion |
Without Covariates |
With Covariates |
-2 LOG L |
309.716 |
301.959 |
AIC |
309.716 |
303.959 |
SBC |
309.716 |
305.830 |
Testing Global Null Hypothesis: BETA=0 |
Test |
Chi-Square |
DF |
Pr > ChiSq |
Likelihood Ratio |
7.7572 |
1 |
0.0053 |
Score |
8.5164 |
1 |
0.0035 |
Wald |
8.3392 |
1 |
0.0039 |
Analysis of Maximum Likelihood Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
Hazard Ratio |
LogBUN |
1 |
1.74595 |
0.60460 |
8.3392 |
0.0039 |
5.731 |
|
Individual score tests
are used to determine which of the nine
explanatory variables is first selected into the model. In
this case, the score test for each variable is
the global score test
for the model containing that variable as
the only explanatory variable.
The chi-squared statistic is
compared to a chi-squared distribution with one degree of freedom.
Output 49.1.1 displays the chi-squared statistics and
the corresponding p-values.
The variable LogBUN has the largest
chi-squared value (8.5164), and it is significant (p=0.0035)
at the SLENTRY=0.25 level.
The variable LogBUN is
thus entered into the model. Output 49.1.2 displays the
model results.
Since the Wald chi-squared statistic is significant (p=0.0039)
at the SLSTAY=0.15 level, LogBUN stays in
the model.
Output 49.1.3: Score Tests Adjusted for the Variable LogBUN
Analysis of Variables Not in the Model |
Variable |
Score Chi-Square |
Pr > ChiSq |
HGB |
4.3468 |
0.0371 |
Platelet |
2.0183 |
0.1554 |
Age |
0.7159 |
0.3975 |
LogWBC |
0.0704 |
0.7908 |
Frac |
1.0354 |
0.3089 |
LogPBM |
1.0334 |
0.3094 |
Protein |
0.5214 |
0.4703 |
SCalc |
1.4150 |
0.2342 |
Residual Chi-Square Test |
Chi-Square |
DF |
Pr > ChiSq |
9.3164 |
8 |
0.3163 |
|
Output 49.1.4: Second Model in the Stepwise Selection Process
Step 2. Variable HGB is entered. The model contains the following explanatory variables: |
Convergence Status |
Convergence criterion (GCONV=1E-8) satisfied. |
Model Fit Statistics |
Criterion |
Without Covariates |
With Covariates |
-2 LOG L |
309.716 |
297.767 |
AIC |
309.716 |
301.767 |
SBC |
309.716 |
305.509 |
Testing Global Null Hypothesis: BETA=0 |
Test |
Chi-Square |
DF |
Pr > ChiSq |
Likelihood Ratio |
11.9493 |
2 |
0.0025 |
Score |
12.7252 |
2 |
0.0017 |
Wald |
12.1900 |
2 |
0.0023 |
Analysis of Maximum Likelihood Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
Hazard Ratio |
LogBUN |
1 |
1.67440 |
0.61209 |
7.4833 |
0.0062 |
5.336 |
HGB |
1 |
-0.11899 |
0.05751 |
4.2811 |
0.0385 |
0.888 |
|
The next step consists of selecting another variable
to add to the model. Output 49.1.3 displays the chi-squared
statistics and p-values of individual score tests
(adjusted
for LogBUN) for
the remaining eight variables.
The score chi-square for
a given variable is the value of the likelihood score test
for testing the significance of the variable in the presence
of LogBUN. The variable HGB is selected because it has the highest
chi-squared value (4.3468), and it is significant (p=0.0371)
at the SLENTRY=0.25 level. Output 49.1.4 displays the
fitted model containing both LogBUN and HGB.
Based on the Wald statistics, neither LogBUN nor
HGB is removed from the model.
Output 49.1.5: Third Model in the Stepwise Regression
Step 3. Variable SCalc is entered. The model contains the following explanatory variables: |
Convergence Status |
Convergence criterion (GCONV=1E-8) satisfied. |
Model Fit Statistics |
Criterion |
Without Covariates |
With Covariates |
-2 LOG L |
309.716 |
296.078 |
AIC |
309.716 |
302.078 |
SBC |
309.716 |
307.692 |
Testing Global Null Hypothesis: BETA=0 |
Test |
Chi-Square |
DF |
Pr > ChiSq |
Likelihood Ratio |
13.6377 |
3 |
0.0034 |
Score |
15.3053 |
3 |
0.0016 |
Wald |
14.4542 |
3 |
0.0023 |
Analysis of Maximum Likelihood Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
Hazard Ratio |
LogBUN |
1 |
1.63593 |
0.62359 |
6.8822 |
0.0087 |
5.134 |
HGB |
1 |
-0.12643 |
0.05868 |
4.6419 |
0.0312 |
0.881 |
SCalc |
1 |
0.13286 |
0.09868 |
1.8127 |
0.1782 |
1.142 |
|
Output 49.1.5 shows Step 3 of the selection process, in which
the variable SCalc is added, resulting in the model with
LogBUN, HGB, and SCalc as the explanatory variables. Note
that SCalc has the smallest Wald chi-squared statistic,
and it is not significant (p=0.1782) at the SLSTAY=0.15
level. The variable SCalc is then removed from the model in
a step-down phase in Step 4 (Output 49.1.6).
The removal
of SCalc
brings the stepwise selection process to a stop in
order to avoid
repeatedly entering and removing the same variable.
The procedure also displays a summary table of the
steps in the stepwise selection process, as shown
in Output 49.1.7.
The stepwise selection process results in a model with two
explanatory variables, LogBUN and HGB.
Output 49.1.6: Final Model in the Stepwise Regression
Step 4. Variable SCalc is removed. The model contains the following explanatory variables: |
Convergence Status |
Convergence criterion (GCONV=1E-8) satisfied. |
Model Fit Statistics |
Criterion |
Without Covariates |
With Covariates |
-2 LOG L |
309.716 |
297.767 |
AIC |
309.716 |
301.767 |
SBC |
309.716 |
305.509 |
Testing Global Null Hypothesis: BETA=0 |
Test |
Chi-Square |
DF |
Pr > ChiSq |
Likelihood Ratio |
11.9493 |
2 |
0.0025 |
Score |
12.7252 |
2 |
0.0017 |
Wald |
12.1900 |
2 |
0.0023 |
Analysis of Maximum Likelihood Estimates |
Variable |
DF |
Parameter Estimate |
Standard Error |
Chi-Square |
Pr > ChiSq |
Hazard Ratio |
LogBUN |
1 |
1.67440 |
0.61209 |
7.4833 |
0.0062 |
5.336 |
HGB |
1 |
-0.11899 |
0.05751 |
4.2811 |
0.0385 |
0.888 |
NOTE:
|
Model building terminates because the variable to be entered is the variable that was removed in the last step.
|
|
|
Output 49.1.7: Model Selection Summary
Summary of Stepwise Selection |
Step |
Variable |
Number In |
Score Chi-Square |
Wald Chi-Square |
Pr > ChiSq |
Entered |
Removed |
1 |
LogBUN |
|
1 |
8.5164 |
. |
0.0035 |
2 |
HGB |
|
2 |
4.3468 |
. |
0.0371 |
3 |
SCalc |
|
3 |
1.8225 |
. |
0.1770 |
4 |
|
SCalc |
2 |
. |
1.8127 |
0.1782 |
|
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.