Simple Linear Rank Tests for Two-Sample Data

Chapter Contents

The NPAR1WAY Procedure

Simple Linear Rank Tests for Two-Sample Data

Statistics of the form

$S = \sum_{j=1}^n c_j a(R_j)$

are called simple linear rank statistics, where

R_j: is the rank of the observation j
a(R_j): is the score based on that rank
c_j: is an indicator variable denoting the class to which the jth observation belongs
n: is the total number of observations

For two-sample data (where the observations are classified into two levels), PROC NPAR1WAY calculates simple linear rank statistics for the scores that you specify. The "Scores for Linear Rank and One-Way ANOVA Tests" section describes the available scores, which you can use to test for differences in location and differences in scale.

To compute S, PROC NPAR1WAY sums the scores of the observations in the smaller of the two samples. If both samples have the same number of observations, PROC NPAR1WAY sums those scores for the sample that appears first in the input data set.

For each score that you specify, PROC NPAR1WAY computes an asymptotic test of the null hypothesis of no difference between the two classification levels. Exact tests are also available for these two-sample linear rank statistics. PROC NPAR1WAY computes exact tests for each score type that you specify in the EXACT statement. See the "Exact Tests" section for details on exact tests.

To compute an asymptotic test for a linear rank sum statistic, PROC NPAR1WAY uses a standardized test statistic z, which has an asymptotic standard normal distribution under the null hypothesis. The standardized test statistic is computed as

$z = \frac{S - E_0(S)}{\sqrt{var_0(S)}}$

where E₀(S) is the expected value of S under the null hypothesis, and Var₀(S) is the variance under the null hypothesis. As shown in Randles and Wolfe (1979),

$E_0(S) = \frac{n_1}n \sum_{j=1}^n a(R_j)$

where n₁ is the number of observations in the first (smaller) class level, and

$Var_0(S) = \frac{1}{(n-1)} \frac{n_1 \cdot n_2}n [ \sum_{j=1}^n (a(R_j) - \bar{a} )^2 ]$

where

$\bar{a} = (1/n) \sum_{j=1}^n a(R_j)$

PROC NPAR1WAY computes one-sided and two-sided asymptotic p-values for each two-sample linear rank test. When the test statistic z is greater than its null hypothesis expected value of zero, PROC NPAR1WAY computes the right-sided p-value, which is the probability of a larger value of the statistic occurring under the null hypothesis. When the test statistic is less than or equal to zero, PROC NPAR1WAY computes the left-sided p-value, which is the probability of a smaller value of the statistic occurring under the null hypothesis. The one-sided p-value P₁ can be expressed as

$P_{1} = {\rm Prob} (Z \gt z) {\rm if} z \gt 0$

$P_{1} = {\rm Prob} (Z \lt z) {\rm if} z \leq 0$

where Z has a standard normal distribution. The two-sided p-value P₂ is computed as

$P_{2} = {\rm Prob} (| Z| \gt | z|)$

For Wilcoxon scores and Siegel-Tukey scores, PROC NPAR1WAY incorporates a continuity correction when computing the standardized test statistic z, unless you specify CORRECT=NO. PROC NPAR1WAY applies the continuity correction by subtracting 0.5 from the numerator S - E₀(S) if it is greater than zero. If the numerator is less than zero, PROC NPAR1WAY adds 0.5. Some sources recommend a continuity correction for nonparametric tests that use a continuous distribution to approximate a discrete distribution. Refer to Sheskin (1997). If you specify CORRECT=NO, PROC NPAR1WAY does not use a continuity correction for any test.

Chapter Contents
Previous
Next
Top