next up previous

STAT 350: Lecture 31

Heteroscedastic Errors

If plots and/or tests show that the error variances tex2html_wrap_inline74 depend on i there are several standard approaches to fixing the problem, depending on the nature of the dependence.

Weighted Least Squares

If

displaymath106

and

displaymath108

and the errors are independent with normal distributions then the likelihood is

displaymath110

To choose tex2html_wrap_inline112 to maximize this likelihood we minimize the quantity

displaymath114

The process is called weighted least squares.

Algebraically it is easy to see how to do the minimization. Rewrite the quantity to be minimized as

displaymath116

This is just an ordinary least squares problem with the response variable being

displaymath118

and the covariates being

displaymath120

The calculation can be written in matrix form. If tex2html_wrap_inline122 is a diagonal matrix with tex2html_wrap_inline124 in the ith diagonal position then put tex2html_wrap_inline128 and tex2html_wrap_inline130 . Then

displaymath132

becomes

displaymath134

If tex2html_wrap_inline136 had mean 0, independent entries and tex2html_wrap_inline138 then tex2html_wrap_inline140 has mean 0, independent entries tex2html_wrap_inline142 and tex2html_wrap_inline144 so that ordinary multiple regression theory applies. The estimate of tex2html_wrap_inline112 is

displaymath148

where now tex2html_wrap_inline150 is a diagonal matrix with tex2html_wrap_inline152 on the diagonal. This estimate is unbiased and has variance covariance matrix

displaymath154

Example

It is possible to do weighted least squares in SAS fairly easily. As an example we consider using the SENIC data set taking the variance of RISK to be proportional to 1/CENSUS. (Motivation: RISK is an estimated proportion; variance of a Binomial proportion is inversely proportional to the sample size. This makes the weight just CENSUS.

proc reg  data=scenic;
  model Risk = Culture Stay Nratio Chest Facil;
  weight Census;
run ;

EDITED OUTPUT (Complete output)

Dependent Variable: RISK                                               
                        Analysis of Variance
                        Sum of         Mean
  Source       DF      Squares       Square      F Value       Prob>F
  Model         5  12876.94280   2575.38856       17.819       0.0001
  Error       107  15464.46721    144.52773
  C Total     112  28341.41001
      Root MSE      12.02197     R-square       0.4544
      Dep Mean       4.76215     Adj R-sq       0.4289
      C.V.         252.44833
                              Parameter Estimates
                    Parameter      Standard    T for H0:               
   Variable  DF      Estimate         Error   Parameter=0    Prob > |T|
   INTERCEP   1      0.468108    0.62393433         0.750        0.4547
   CULTURE    1      0.030005    0.00891714         3.365        0.0011
   STAY       1      0.237420    0.04444810         5.342        0.0001
   NRATIO     1      0.623850    0.34803271         1.793        0.0759
   CHEST      1      0.003547    0.00444160         0.799        0.4263
   FACIL      1      0.008854    0.00603368         1.467        0.1452
EDITED OUTPUT FOR UNWEIGHTED CASE (Complete output)
Dependent Variable: RISK                                               
                           Analysis of Variance
                              Sum of         Mean
  Source       DF      Squares       Square      F Value       Prob>F
  Model         5    108.32717     21.66543       24.913       0.0001
  Error       107     93.05266      0.86965
  C Total     112    201.37982
      Root MSE       0.93255     R-square       0.5379
      Dep Mean       4.35487     Adj R-sq       0.5163
      C.V.          21.41399
                        Parameter Estimates
                 Parameter      Standard    T for H0:               
  Variable  DF      Estimate         Error   Parameter=0    Prob > |T|
  INTERCEP   1     -0.768043    0.61022741        -1.259        0.2109
  CULTURE    1      0.043189    0.00984976         4.385        0.0001
  STAY       1      0.233926    0.05741114         4.075        0.0001
  NRATIO     1      0.672403    0.29931440         2.246        0.0267
  CHEST      1      0.009179    0.00540681         1.698        0.0925
  FACIL      1      0.018439    0.00629673         2.928        0.0042

Transformation

Sometimes the response variable will have a distribution which makes it likely that the errors will be not very normal and that the errors will not be homoscedastic. Typical examples:

The traditional analysis method is to try transformation:

BIGGEST PROBLEM If the model was linear before transformation then it will not be linear after transformation.


next up previous



Richard Lockhart
Wed Mar 19 22:34:23 PST 1997