Robust Measures of Scale

Distribution Analyses

Robust Measures of Scale

The sample standard deviation is a commonly used estimator of the population scale. However, it is sensitive to outliers and may not remain bounded when a single data point is replaced by an arbitrary number. With robust scale estimators, the estimates remain bounded even when a portion of the data points are replaced by arbitrary numbers.

A simple robust scale estimator is the interquartile range, which is the difference between the upper and lower quartiles. For a normal population, the standard deviation $\sigma$ can be estimated by dividing the interquartile range by 1.34898.

Gini's mean difference is also a robust estimator of the standard deviation $\sigma$ .It is computed as

$G = \frac{1}{{n \choose 2}} \sum_{i\lt j}^{}{{| y_{i} - y_{j}|}}$

If the observations are from a normal distribution, then ${\sqrt{{\pi}}G/2}$ is an unbiased estimator of the standard deviation $\sigma$ .

A very robust scale estimator is the median absolute deviation (MAD) about the median (Hampel 1974).

${MAD} = \,med_{i} ( | y_{i} - \,med_{j}(y_{j}) | )$

where the inner median, med_j (y_j), is the median of the n observations and the outer median, med_i, is the median of the n absolute values of the deviations about the median.

For a normal distribution, 1.4826 MAD can be used to estimate the standard deviation $\sigma$ .

The MAD statistic has low efficiency for normal distributions and it may not be appropriate for symmetric distributions. Rousseeuw and Croux (1993) proposed two new statistics as alternatives to the MAD statistic, S_n and Q_n.

$S_{n} = 1.1926 \,med_{i} ( \,med_{j} (| y_{i}-y_{j}|) )$

where the outer median, med_i, is the median of the n medians of

$\{ | y_{i}-y_{j}|; j=1,2, .., n\}.$

To reduce small-sample bias, c_snS_n is used to estimate the standard deviation $\sigma$ , where c_sn is a correction factor (Croux and Rousseeuw 1992).

The second statistic is computed as

$Q_{n} = 2.2219 \{| y_{i}-y_{j}|; i\lt j \}_{(k)}$

where $k = {h \choose 2}$ , h = [n/2]+1 and [n/2] is the integer part of n/2. That is, Q_n is 2.2219 times the kth order statistic of the ${n \choose 2}$ distances between data points.

The bias-corrected statistic c_qnQ_n is used to estimate the standard deviation $\sigma$ , where c_qnis the correction factor.

A Robust Measures of Scale table includes the interquartile range, Gini's mean difference, MAD, S_n, and Q_n, with their corresponding estimates of $\sigma$ ,as shown in Figure 38.14.

Figure 38.14: Robust Measures of Scale and Tests for Normality

Chapter Contents
Previous
Next
Top