Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Distribution Analyses

Robust Measures of Scale

The sample standard deviation is a commonly used estimator of the population scale. However, it is sensitive to outliers and may not remain bounded when a single data point is replaced by an arbitrary number. With robust scale estimators, the estimates remain bounded even when a portion of the data points are replaced by arbitrary numbers.

A simple robust scale estimator is the interquartile range, which is the difference between the upper and lower quartiles. For a normal population, the standard deviation \sigma can be estimated by dividing the interquartile range by 1.34898.

Gini's mean difference is also a robust estimator of the standard deviation \sigma.It is computed as

G = \frac{1}{{n \choose 2}}
 \sum_{i\lt j}^{}{{| y_{i} - y_{j}|}}

If the observations are from a normal distribution, then {\sqrt{{\pi}}G/2} is an unbiased estimator of the standard deviation \sigma.

A very robust scale estimator is the median absolute deviation (MAD) about the median (Hampel 1974).

{MAD} = \,med_{i} ( | y_{i} - \,med_{j}(y_{j}) | )
where the inner median, medj (yj), is the median of the n observations and the outer median, medi, is the median of the n absolute values of the deviations about the median.

For a normal distribution, 1.4826 MAD can be used to estimate the standard deviation \sigma.

The MAD statistic has low efficiency for normal distributions and it may not be appropriate for symmetric distributions. Rousseeuw and Croux (1993) proposed two new statistics as alternatives to the MAD statistic, Sn and Qn.

S_{n} = 1.1926 \,med_{i} ( \,med_{j} (| y_{i}-y_{j}|) )
where the outer median, medi, is the median of the n medians of
\{ | y_{i}-y_{j}|; j=1,2, .., n\}.

To reduce small-sample bias, csnSn is used to estimate the standard deviation \sigma, where csn is a correction factor (Croux and Rousseeuw 1992).

The second statistic is computed as

Q_{n} = 2.2219 \{| y_{i}-y_{j}|; i\lt j \}_{(k)}
where k = {h \choose 2}, h = [n/2]+1 and [n/2] is the integer part of n/2. That is, Qn is 2.2219 times the kth order statistic of the {n \choose 2} distances between data points.

The bias-corrected statistic cqnQn is used to estimate the standard deviation \sigma, where cqnis the correction factor.

A Robust Measures of Scale table includes the interquartile range, Gini's mean difference, MAD, Sn, and Qn, with their corresponding estimates of \sigma,as shown in Figure 38.14.

dist14.gif (9505 bytes)

Figure 38.14: Robust Measures of Scale and Tests for Normality

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.