Predicted and Residual Values

Introduction to Regression Procedures

Predicted and Residual Values

After the model has been fit, predicted and residual values are usually calculated and output. The predicted values are calculated from the estimated regression equation; the residuals are calculated as actual minus predicted. Some procedures can calculate standard errors of residuals, predicted mean values, and individual predicted values.

Consider the ith observation where x_i is the row of regressors, b is the vector of parameter estimates, and s² is the mean squared error.

Let

$h_i = x_i (X^' X)^{-1} x_i^' { (the leverage)}$

Then

$\hat{y}_i & = & x_i b { (the predicted mean value)} \{STDERR}(\hat{y}_i) & = & \sqrt{h_i s^2} { (the standard error of the predicted mean)}$

The standard error of the individual (future) predicted value y_i is

${STDERR}(y_i) = \sqrt{(1 + h_i) s^2}$

The residual is defined as

${RESID}_i & = & y_i - x_i b { (the residual)} \{STDERR}({RESID}_i) & = & \sqrt{(1 - h_i) s^2} { (the standard error of the residual)}$

The ratio of the residual to its standard error, called the studentized residual, is sometimes shown as

STUDENT_i = [( RESID_i)/( STDERR( RESID_i))]

There are two kinds of confidence intervals for predicted values. One type of confidence interval is an interval for the mean value of the response. The other type, sometimes called a prediction or forecasting interval, is an interval for the actual value of a response, which is the mean value plus error.

For example, you can construct for the ith observation a confidence interval that contains the true mean value of the response with probability $1 - \alpha$ .The upper and lower limits of the confidence interval for the mean value are

${LowerM} & = & x_i b - t_{\alpha/2} \sqrt{h_i s^2} \{UpperM} & = & x_i b + t_{\alpha/2} \sqrt{h_i s^2}$

where $t_{\alpha/2}$ is the tabulated t statistic with degrees of freedom equal to the degrees of freedom for the mean squared error.

The limits for the confidence interval for an actual individual response are

${LowerI} & = & x_i b - t_{\alpha/2} \sqrt{(1+h_i) s^2} \{UpperI} & = & x_i b + t_{\alpha/2} \sqrt{(1+h_i) s^2}$

Influential observations are those that, according to various criteria, appear to have a large influence on the parameter estimates. One measure of influence, Cook's D, measures the change to the estimates that results from deleting each observation:

${COOKD} = \frac{1}k {STUDENT}^2 ( \frac{{STDERR}(\hat{y})} {{STDERR}({RESID})} )^2$

where k is the number of parameters in the model (including the intercept). For more information, refer to Cook (1977, 1979).

The predicted residual for observation i is defined as the residual for the ith observation that results from dropping the ith observation from the parameter estimates. The sum of squares of predicted residual errors is called the PRESS statistic:

${PRESID}_i & = & \frac{{RESID}_i}{1-h_i} \{PRESS} & = & \sum_{i=1}^n {PRESID}_i^2$

Chapter Contents
Previous
Next
Top