Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The NLIN Procedure

Example 45.2: Iteratively Reweighted Least Squares

The NLIN procedure is suited to methods that make the weight a function of the parameters in each iteration since the _WEIGHT_ variable can be computed with program statements.

The NOHALVE option is used because the SSE definition is modified at each iteration and the step-shortening criteria is thus circumvented.

Iteratively reweighted least squares (IRLS) can produce estimates for many of the robust regression criteria suggested in the literature. These methods act like automatic outlier rejectors since large residual values lead to very small weights. Holland and Welsch (1977) outline several of these robust methods. For example, the biweight criterion suggested by Beaton and Tukey (1974) tries to minimize

S_{biweight}= \Sigma \rho(r)

where

\rho( r ) = (B^2 / 2)(1 - (1 - ( r / B)^2)^2 ) {\rm if} | r |\leq B

or

\rho(r) = (B^2 / 2) {\rm otherwise }

where r is | residual|/\sigma, \sigma is a measure of scale of the error, and B is a tuning constant.

The weighting function for the biweight is

w_i = (1 - ( r_i / B)^2)^2  {\rm if} | r_i | \leq B

or

w_i = 0  {\rm if} | r_i | \gt B

The biweight estimator depends on both a measure of scale (like the standard deviation) and a tuning constant; results vary if these values are changed.

The data are the population of the United States (in millions), recorded at ten-year intervals starting in 1790 and ending in 1990.

   title 'U.S. Population Growth';
   data uspop;
      input pop :6.3 @@;
      retain year 1780;
      year=year+10;
      yearsq=year*year;
      datalines;
   3929 5308 7239 9638 12866 17069 23191 31443 39818 50155
   62947 75994 91972 105710 122775 131669 151325 179323 203211
   226542 248710
   ;

   title 'Beaton/Tukey Biweight Robust Regression using IRLS';
   proc nlin data=uspop nohalve;
      parms b0=20450.43 b1=-22.7806 b2=.0063456;
      model pop=b0+b1*year+b2*year*year;
      resid=pop-model.pop;
      sigma=2;
      b=4.685;
      r=abs(resid / sigma);
      if r<=b then _weight_=(1-(r / b)**2)**2;
      else _weight_=0;
      output out=c r=rbi;
   run;

   data c;
   set c;
      sigma=2;
      b=4.685;
      r=abs(rbi / sigma);
      if r<=b then _weight_=(1-(r / b)**2)**2;
      else _weight_=0;
   proc print;
   run;

Output 45.2.1: Nonlinear Least Squares Analysis

Beaton/Tukey Biweight Robust Regression using IRLS

The NLIN Procedure

Source DF Sum of Squares Mean Square F Value Approx
Pr > F
Regression 3 232436 77478.8 49454.5 <.0001
Residual 18 20.6670 1.1482    
Uncorrected Total 21 232457      
           
Corrected Total 20 113585      

Parameter Estimate Approx
Std Error
Approximate 95% Confidence
Limits
b0 20828.7 259.4 20283.8 21373.6
b1 -23.2004 0.2746 -23.7773 -22.6235
b2 0.00646 0.000073 0.00631 0.00661

Output 45.2.2: Listing of Computed Weights from PROC NLIN

Beaton/Tukey Biweight Robust Regression using IRLS

Obs pop year yearsq RBI sigma b r _weight_
1 3.929 1790 3204100 -0.93711 2 4.685 0.46855 0.98010
2 5.308 1800 3240000 0.46091 2 4.685 0.23045 0.99517
3 7.239 1810 3276100 1.11853 2 4.685 0.55926 0.97170
4 9.638 1820 3312400 0.95176 2 4.685 0.47588 0.97947
5 12.866 1830 3348900 0.32159 2 4.685 0.16080 0.99765
6 17.069 1840 3385600 -0.62597 2 4.685 0.31298 0.99109
7 23.191 1850 3422500 -0.94692 2 4.685 0.47346 0.97968
8 31.443 1860 3459600 -0.43027 2 4.685 0.21514 0.99579
9 39.818 1870 3496900 -1.08302 2 4.685 0.54151 0.97346
10 50.155 1880 3534400 -1.06615 2 4.685 0.53308 0.97427
11 62.947 1890 3572100 0.11332 2 4.685 0.05666 0.99971
12 75.994 1900 3610000 0.25539 2 4.685 0.12770 0.99851
13 91.972 1910 3648100 2.03607 2 4.685 1.01804 0.90779
14 105.710 1920 3686400 0.28436 2 4.685 0.14218 0.99816
15 122.775 1930 3724900 0.56725 2 4.685 0.28363 0.99268
16 131.669 1940 3763600 -8.61325 2 4.685 4.30662 0.02403
17 151.325 1950 3802500 -8.32415 2 4.685 4.16207 0.04443
18 179.323 1960 3841600 -0.98543 2 4.685 0.49272 0.97800
19 203.211 1970 3880900 0.95088 2 4.685 0.47544 0.97951
20 226.542 1980 3920400 1.03780 2 4.685 0.51890 0.97562
21 248.710 1990 3960100 -1.33067 2 4.685 0.66533 0.96007


Output 45.2.2 displays the computed weights. The observations for 1940 and 1950 are highly discounted because of their large residuals.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.