Scores for Linear Rank and One-Way ANOVA Tests

The NPAR1WAY Procedure

Scores for Linear Rank and One-Way ANOVA Tests

For each score type that you specify, PROC NPAR1WAY computes a one-way ANOVA statistic and also a linear rank statistic for two-sample data. The following score types are used primarily to test for differences in location: Wilcoxon, median, Van der Waerden, and Savage. The following scores types are used to test for scale differences: Siegel-Tukey, Ansari-Bradley, Klotz and Mood. This section gives formulas for the score types. For further information on the formulas and the applicability of each score, refer to Randles and Wolfe (1979), Gibbons and Chakraborti (1992), Conover (1980), and Hollander and Wolfe (1973).

In addition to the score types described in this section, you can specify the SCORES=DATA option to use the input data observations as scores. This enables you to produce a very wide variety of tests. You can construct any scores using the DATA step, and then PROC NPAR1WAY computes the corresponding linear rank and one-way ANOVA tests. You can also analyze the raw data with the SCORES=DATA option; for two-sample data, this permutation test is known as Pitman's test.

Wilcoxon Scores

Wilcoxon scores are the ranks of the observations.

a(R_j) = R_j

Using Wilcoxon scores in the linear rank statistic for two-sample data produces the rank sum statistic of the Mann-Whitney-Wilcoxon test. Using Wilcoxon scores in the one-way ANOVA statistic produces the Kruskal-Wallis test. Wilcoxon scores are locally most powerful for location shifts of a logistic distribution.

When computing the asymptotic Wilcoxon two-sample test, PROC NPAR1WAY uses a continuity correction by default, as described in the "Simple Linear Rank Tests for Two-Sample Data" section. If you specify CORRECT=NO in the PROC NPAR1WAY statement, the procedure does not use a continuity correction.

Median Scores

Median scores equal 1 for observations greater than the median, and 0 otherwise.

$a(R_j) = \{ 1 & & {if } R_j \gt \frac{n + 1}2 \ 0 & & {if } R_j \leq \frac{n + 1}2 \ .$

Using median scores in the linear rank statistic for two-sample data produces the two-sample median test. The one-way ANOVA statistic with median scores is equivalent to the Brown-Mood test. Median scores are particularly powerful for distributions that are symmetric and heavy-tailed.

Van der Waerden Scores

Van der Waerden scores are the quantiles of a standard normal distribution. These scores are also known as quantile normal scores.

$a(R_j) = \Phi^{-1} ( \frac{R_j}{n + 1} )$

where $\Phi$ is the cumulative distribution function of a standard normal distribution. These scores are powerful for normal distributions.

Savage Scores

Savage scores are expected values of order statistics from the exponential distribution, with 1 subtracted to center the scores around 0.

$a(R_j) = \sum_{i=1}^{R_j} \frac{1}{n - i + 1} - 1$

Savage scores are powerful for comparing scale differences in exponential distributions or location shifts in extreme value distributions (Hajek 1969, p. 83).

Siegel-Tukey Scores

Siegel-Tukey scores are computed as

$a(1) = 1, & a(n) = 2, & a(n-1) = 3, & a(2) = 4, \a(3) = 5, & a(n-2) = 6, & a(n-3) = 7, & a(4) = 8, ... \$

where the score values continue to increase in this pattern towards the middle ranks until all observations have been assigned a score.

Ansari-Bradley Scores

Ansari-Bradley scores are similar to Siegel-Tukey scores, but Ansari-Bradley assigns the same scores to corresponding extreme ranks. (Siegel Tukey scores are just a permutation of the ranks 1, 2, ..., n.)

$a(1) = 1, & a(n) = 1, \a(2) = 2, & a(n-1) = 2, ... \$

Equivalently, Ansari-Bradley scores are defined as

$a(R_j) = \frac{n+1}2 - | R_j - \frac{n+1}2|$

Klotz Scores

Klotz scores are the squares of the Van der Waerden (or quantile normal) scores.

$a(R_j) = [ \Phi^{-1} ( \frac{R_j}{n + 1} ) ] ^ 2$

where $\Phi$ is the cumulative distribution function of a standard normal distribution.

Mood Scores

Mood scores are computed as the square of the difference between each rank and the average rank.

a(R_j) = [ R_j - (n+1)/2 ] ²

Chapter Contents
Previous
Next
Top