Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The REG Procedure

Predicted and Residual Values

The display of the predicted values and residuals is controlled by the P, R, CLM, and CLI options in the MODEL statement. The P option causes PROC REG to display the observation number, the ID value (if an ID statement is used), the actual value, the predicted value, and the residual. The R, CLI, and CLM options also produce the items under the P option. Thus, P is unnecessary if you use one of the other options.

The R option requests more detail, especially about the residuals. The standard errors of the mean predicted value and the residual are displayed. The studentized residual, which is the residual divided by its standard error, is both displayed and plotted. A measure of influence, Cook's D, is displayed. Cook's D measures the change to the estimates that results from deleting each observation (Cook 1977, 1979). This statistic is very similar to DFFITS.

The CLM option requests that PROC REG display the 100(1-\alpha)% lower and upper confidence limits for the mean predicted values. This accounts for the variation due to estimating the parameters only. If you want a 100(1-\alpha)% confidence interval for observed values, then you can use the CLI option, which adds in the variability of the error term. The \alpha level can be specified with the ALPHA= option in the PROC REG or MODEL statement.

You can use these statistics in PLOT and PAINT statements. This is useful in performing a variety of regression diagnostics. For definitions of the statistics produced by these options, see Chapter 3, "Introduction to Regression Procedures."

The following example uses the US population data found on the section "Polynomial Regression".

   data USPop2;
      input Year @@;
      YearSq=Year*Year;
      datalines;
   1980 1990 2000
   ;
   data USPop2;
      set USPopulation USPop2;

   proc reg data=USPop2;
      id Year;
      model Population=Year YearSq / r cli clm;
   run;

 
The REG Procedure
Model: MODEL1
Dependent Variable: Population

Analysis of Variance
Source DF Sum of
Squares
Mean
Square
F Value Pr > F
Model 2 71799 35900 4641.72 <.0001
Error 16 123.74557 7.73410    
Corrected Total 18 71923      
 
Root MSE 2.78102 R-Square 0.9983
Dependent Mean 69.76747 Adj R-Sq 0.9981
Coeff Var 3.98613    
 
Parameter Estimates
Variable DF Parameter
Estimate
Standard
Error
t Value Pr > |t|
Intercept 1 20450 843.47533 24.25 <.0001
Year 1 -22.78061 0.89785 -25.37 <.0001
YearSq 1 0.00635 0.00023877 26.58 <.0001
Figure 55.30: Regression Using the R, CLI, and CLM Options

 
The REG Procedure
Model: MODEL1
Dependent Variable: Population

Output Statistics
Obs Year Dep Var
Population
Predicted
Value
Std Error
Mean Predict
95% CL Mean 95% CL Predict Residual Std Error
Residual
Student
Residual
  -2-1 0 1 2 Cook's
D
1 1790 3.9290 5.0384 1.7289 1.3734 8.7035 -1.9034 11.9803 -1.1094 2.178 -0.509 |     *|      | 0.054
2 1800 5.3080 5.0389 1.3909 2.0904 7.9874 -1.5528 11.6306 0.2691 2.408 0.112 |      |      | 0.001
3 1810 7.2390 6.3085 1.1304 3.9122 8.7047 -0.0554 12.6724 0.9305 2.541 0.366 |      |      | 0.009
4 1820 9.6380 8.8472 0.9571 6.8182 10.8761 2.6123 15.0820 0.7908 2.611 0.303 |      |      | 0.004
5 1830 12.8660 12.6550 0.8721 10.8062 14.5037 6.4764 18.8335 0.2110 2.641 0.0799 |      |      | 0.000
6 1840 17.0690 17.7319 0.8578 15.9133 19.5504 11.5623 23.9015 -0.6629 2.645 -0.251 |      |      | 0.002
7 1850 23.1910 24.0779 0.8835 22.2049 25.9509 17.8920 30.2638 -0.8869 2.637 -0.336 |      |      | 0.004
8 1860 31.4430 31.6931 0.9202 29.7424 33.6437 25.4832 37.9029 -0.2501 2.624 -0.0953 |      |      | 0.000
9 1870 39.8180 40.5773 0.9487 38.5661 42.5885 34.3482 46.8065 -0.7593 2.614 -0.290 |      |      | 0.004
10 1880 50.1550 50.7307 0.9592 48.6972 52.7642 44.4944 56.9671 -0.5757 2.610 -0.221 |      |      | 0.002
11 1890 62.9470 62.1532 0.9487 60.1420 64.1644 55.9241 68.3823 0.7938 2.614 0.304 |      |      | 0.004
12 1900 75.9940 74.8448 0.9202 72.8942 76.7955 68.6350 81.0547 1.1492 2.624 0.438 |      |      | 0.008
13 1910 91.9720 88.8056 0.8835 86.9326 90.6785 82.6197 94.9915 3.1664 2.637 1.201 |      |**    | 0.054
14 1920 105.7100 104.0354 0.8578 102.2169 105.8540 97.8658 110.2051 1.6746 2.645 0.633 |      |*     | 0.014
15 1930 122.7750 120.5344 0.8721 118.6857 122.3831 114.3558 126.7130 2.2406 2.641 0.848 |      |*     | 0.026
16 1940 131.6690 138.3025 0.9571 136.2735 140.3315 132.0676 144.5374 -6.6335 2.611 -2.540 | *****|      | 0.289
17 1950 151.3250 157.3397 1.1304 154.9434 159.7360 150.9758 163.7036 -6.0147 2.541 -2.367 |  ****|      | 0.370
18 1960 179.3230 177.6460 1.3909 174.6975 180.5945 171.0543 184.2377 1.6770 2.408 0.696 |      |*     | 0.054
19 1970 203.2110 199.2215 1.7289 195.5564 202.8865 192.2796 206.1633 3.9895 2.178 1.831 |      |***   | 0.704
20 1980 . 222.0660 2.1348 217.5404 226.5916 214.6338 229.4983 . . .                 .
21 1990 . 246.1797 2.6019 240.6639 251.6955 238.1062 254.2532 . . .                 .
22 2000 . 271.5625 3.1257 264.9363 278.1887 262.6932 280.4317 . . .                 .
 
Sum of Residuals -5.8175E-11
Sum of Squared Residuals 123.74557
Predicted Residual SS (PRESS) 188.54924
Figure 55.31: Regression Using the R, CLI, and CLM Options

After producing the usual Analysis of Variance and Parameter Estimates tables (Figure 55.29), the procedure displays the results of requesting the options for predicted and residual values (Figure 55.30). For each observation, the requested information is shown. Note that the ID variable is used to identify each observation. Also note that, for observations with missing dependent variables, the predicted value, standard error of the predicted value, and confidence intervals for the predicted value are still available.

The plot of studentized residuals and Cook's D statistics are displayed as a result of requesting the R option. In the plot of studentized residuals, a large number of observations with absolute values greater than two indicates an inadequate model. A version of the studentized residual plot can be created on a high-resolution graphics device; see Example 55.7 for a similar example.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.