|
Chapter Contents |
Previous |
Next |
| Robust Regression Examples |
SAS/IML has three subroutines that can be used for outlier detection and robust regression. The Least Median of Squares (LMS) and Least Trimmed Squares (LTS) subroutines perform robust regression (sometimes called resistant regression). These subroutines are able to detect outliers and perform a least-squares regression on the remaining observations. The Minimum Volume Ellipsoid Estimation (MVE) subroutine can be used to find the minimum volume ellipsoid estimator, which is the location and robust covariance matrix that can be used for constructing confidence regions and for detecting multivariate outliers and leverage points. Moreover, the MVE subroutine provides a table of robust distances and classical Mahalanobis distances. The LMS, LTS, and MVE subroutines and some other robust estimation theories and methods were developed by Rousseeuw (1984) and Rousseeuw and Leroy (1987). Some statistical applications for MVE are described in Rousseeuw and Van Zomeren (1990).
Whereas robust regression methods like L1 or Huber M-estimators reduce the influence of outliers only (compared to least-squares or L2 regression), resistant regression methods like LMS and LTS can completely disregard influential outliers (sometimes called leverage points) from the fit of the model. The algorithms used in the LMS and LTS subroutines are based on the PROGRESS program by Rousseeuw and Leroy (1987). Rousseeuw and Hubert (1996) prepared a new version of PROGRESS to facilitate its inclusion in SAS software, and they have incorporated several recent developments. Among other things, the new version of PROGRESS now yields the exact LMS for simple regression, and the program uses a new definition of the robust coefficient of determination (R2). Therefore, the outputs may differ slightly from those given in Rousseeuw and Leroy (1987) or those obtained from software based on the older version of PROGRESS. The MVE algorithm is based on the algorithm used in the MINVOL program by Rousseeuw (1984).
The three SAS/IML subroutines are designed for

For each parameter vector b = (b1, ... ,bn), the residual of observation i is ri = yi - xi b. You then denote the ordered, squared residuals as



The objective function for the MVE optimization problem is based on the hth quantile dh:N of the Mahalanobis-type distances d = (d1, ... ,dN),


Because of the nonsmooth form of these objective functions, the estimates cannot be obtained with traditional optimization algorithms. For LMS and LTS, the algorithm, as in the PROGRESS program, selects a number of subsets of n observations out of the N given observations, evaluates the objective function, and saves the subset with the lowest objective function. As long as the problem size enables you to evaluate all such subsets, the result is a global optimum. If computer time does not permit you to evaluate all the different subsets, a random collection of subsets is evaluated. In such a case, you may not obtain the global optimum.
Note that the LMS, LTS, and MVE subroutines are executed only when the number N of observations is over twice the number n of explanatory variables xj (including the intercept), that is, if N > 2n.
|
Chapter Contents |
Previous |
Next |
Top |
Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.