Tests for Normality

PROC CAPABILITY and General Statements

Tests for Normality

You can use the NORMALTEST option in the PROC CAPABILITY statement to request several tests of the hypothesis that the analysis variable values are a random sample from a normal distribution. These tests, which are summarized in the table labeled Tests for Normality, include the following:

Shapiro-Wilk test
Kolmogorov-Smirnov test
Anderson-Darling test
Cram $\acute{e}$ r-von Mises test

Tests for normality are particularly important in process capability analysis because the commonly used capability indices are difficult to interpret unless the data are at least approximately normally distributed. Furthermore, the confidence limits for capability indices displayed in the table labeled Process Capability Indices require the assumption of normality. Consequently, the tests of normality are always computed when you specify the SPEC statement, and a note is added to the table when the hypothesis of normality is rejected. You can specify the particular test and the significance level with the CHECKINDICES option.

Shapiro-Wilk Test

If the sample size is 2000 or less,^* the procedure computes the Shapiro-Wilk statistic W (also denoted as W_n to emphasize its dependence on the sample size n). The statistic W_n is the ratio of the best estimator of the variance (based on the square of a linear combination of the order statistics) to the usual corrected sum of squares estimator of the variance. When n is greater than three, the coefficients to compute the linear combination of the order statistics are approximated by the method of Royston (1992). The statistic W_n is always greater than zero and less than or equal to one $(0 \lt W \leq 1)$ .

Small values of W lead to rejection of the null hypothesis. The method for computing the p-value (the probability of obtaining a W statistic less than or equal to the observed value) depends on n. For n=3, the probability distribution of W is known and is used to determine the p-value. For n>4, a normalizing transformation is computed:

$Z_{n} = \{ ( - \log ( \gamma - \log ( 1- W_n ) ) - \mu ) / \sigma & {if 4 \le... ...\leq 11 } \ ( \log ( 1 - W_n ) - \mu ) / \sigma & {if 12 \leq n \leq 2000 } .$

The values of $\sigma$ , $\gamma$ , and $\mu$ are functions of n obtained from simulation results. Large values of Z_n indicate departure from normality, and since the statistic Z_n has an approximately standard normal distribution, this distribution is used to determine the p-values for n>4.

EDF Tests for Normality

The Kolmogorov-Smirnov, Anderson-Darling and Cram $\acute{e}$ r-von Mises tests for normality are based on the empirical distribution function (EDF) and are often referred to as EDF tests. EDF tests for a variety of non-normal distributions are available in the HISTOGRAM statement; see the "EDF Goodness-of-Fit Tests" section for details. For a thorough discussion of these tests, refer to D'Agostino and Stephens (1986).

The empirical distribution function is defined for a set of n independent observations X₁, ... ,X_n with a common distribution function F(x). Under the null hypothesis, F(x) is the normal distribution. Denote the observations ordered from smallest to largest as X₍₁₎, ... ,X_(n). The empirical distribution function, F_n(x), is defined as

$F_{n}(x) = \{ 0, & x \lt X_{(1)} \ \frac{i}n, & X_{(i)} \leq x \lt X_{(i+1)} , i=1, ... ,n-1 \ 1, & X_{(n)} \leq x .$

Note that F_n(x) is a step function that takes a step of height [1/n] at each observation. This function estimates the distribution function F(x). At any value x, F_n(x) is the proportion of observations less than or equal to x, while F(x) is the probability of an observation less than or equal to x. EDF statistics measure the discrepancy between F_n(x) and F(x).

The EDF tests make use of the probability integral transformation U=F(X). If F(X) is the distribution function of X, the random variable U is uniformly distributed between 0 and 1. Given n observations X₍₁₎, ... ,X_(n), the values U_(i)=F(X_(i)) are computed. These values are used to compute the EDF test statistics, as described in the next three sections. The CAPABILITY procedures computes the associated p-values by interpolating internal tables of probability levels similar to those given by D'Agostino and Stephens (1986).

Kolmogorov-Smirnov Test

The Kolmogorov-Smirnov statistic (D) is defined as

$D = {sup}_x| F_{n}(x)-F(x)|$

The Kolmogorov-Smirnov statistic belongs to the supremum class of EDF statistics. This class of statistics is based on the largest vertical difference between F(x) and F_n(x).

The Kolmogorov-Smirnov statistic is computed as the maximum of D⁺ and D^-, where D⁺ is the largest vertical distance between the EDF and the distribution function when the EDF is greater than the distribution function, and D^- is the largest vertical distance when the EDF is less than the distribution function.

$D^{+} & = & \max_{i}(\frac{i}n - U_{(i)}) \ D^{-} & = & \max_{i}(U_{(i)} - \frac{i-1}n) \ D & = & \max(D^{+},D^{-})$

PROC CAPABILITY uses a modified Komogorov D statistic to test the data against a normal distribution with mean and variance equal to the sample mean and variance.

Anderson-Darling Test

The Anderson-Darling statistic and the Cram $\acute{e}$ r-von Mises statistic belong to the quadratic class of EDF statistics. This class of statistics is based on the squared difference (F_n(x)- F(x))². Quadratic statistics have the following general form:

$Q = n \int_{-\infty}^{+\infty} (F_n(x)-F(x))^2 \psi(x) dF(x)$