The RANK Procedure

PROC RANK Statement

PROC RANK <option(s)>;

To do this Use this option

Specify the input data set DATA=

Create an output data set OUT=

Specify the ranking method

Compute fractional ranks FRACTION or NPLUS1

Partition observations into groups GROUPS=

Compute normal scores NORMAL=

Compute percentages PERCENT

Compute Savage scores SAVAGE

Reverse the order of the rankings DESCENDING

Specify how to rank tied values TIES=

To do this	Use this option
Specify the input data set	DATA=
Create an output data set	OUT=
Specify the ranking method
	Compute fractional ranks	FRACTION or NPLUS1
	Partition observations into groups	GROUPS=
	Compute normal scores	NORMAL=
	Compute percentages	PERCENT
	Compute Savage scores	SAVAGE
Reverse the order of the rankings	DESCENDING
Specify how to rank tied values	TIES=

Note:

You can specify only one ranking method in a single PROC RANK step. [cautionend]

Options

DATA=SAS-data-set

specifies the input SAS data set.

Main discussion:	Input Data Sets
Restriction:	You cannot use PROC RANK with an engine that supports concurrent access if another user is updating the data set at the same time.

DESCENDING

reverses the direction of the ranks. With DESCENDING, the largest value receives a rank of 1, the next largest value receives a rank of 2, and so on. Otherwise, values are ranked from smallest to largest.

Featured in:

Ranking Values of Multiple Variables and Ranking Values within BY Groups

FRACTION

computes fractional ranks by dividing each rank by the number of observations having nonmissing values of the ranking variable.

Alias:	F
Interaction:	TIES=HIGH is the default with the FRACTION option. With TIES=HIGH, fractional ranks can be considered values of a right-continuous empirical cumulative distribution function.
See also:	NPLUS1 option

GROUPS=number-of-groups

assigns group values ranging from 0 to number-of-groups minus 1. Common specifications are GROUPS=100 for percentiles, GROUPS=10 for deciles, and GROUPS=4 for quartiles. For example, GROUPS=4 partitions the original values into four groups, with the smallest values receiving, by default, a quartile value of 0 and the largest values receiving a quartile value of 3.

The formula for calculating group values is

[IMAGE]

where FLOOR is the FLOOR function, rank is the value's order rank, k is the value of GROUPS=, and n is the number of observations having nonmissing values of the ranking variable.

If the number of observations is evenly divisible by the number of groups, each group has the same number of observations, provided there are no tied values at the boundaries of the groups. Grouping observations by a variable that has many tied values can result in unbalanced groups because PROC RANK always assigns observations with the same value to the same group.

Tip: Use DESCENDING to reverse the order of the group values.
Featured in: Partitioning Observations into Groups Based on Ranks

NORMAL=BLOM | TUKEY | VW

computes normal scores from the ranks. The resulting variables appear normally distributed. The formulas are

BLOM	y_i=^-1(r_i-3/8)/(n+1/4)
TUKEY	y_i=^-1(r_i-1/3)/(n+1/3)
VW	y_i=^-1(r_i)/(n+1)

where

^-1 is the inverse cumulative normal (PROBIT) function, r_i is the rank of the ith observation, and n is the number of nonmissing observations for the ranking variable.

VW stands for van der Waerden. With NORMAL=VW, you can use the scores for a nonparametric location test. All three normal scores are approximations to the exact expected order statistics for the normal distribution, also called normal scores. The BLOM version appears to fit slightly better than the others (Blom 1958; Tukey 1962).

NPLUS1

computes fractional ranks by dividing each rank by the denominator n+1, where n is the number of observations having nonmissing values of the ranking variable.

Aliases:	FN1, N1
Interaction:	TIES=HIGH is the default with the NPLUS1 option.
See also:	FRACTION option

OUT=SAS-data-set

names the output data set. If SAS-data-set does not exist, PROC RANK creates it. If you omit OUT=, the data set is named using the DATAn naming convention.

PERCENT

divides each rank by the number of observations having nonmissing values of the variable and multiplies the result by 100 to get a percentage.

Alias:	P
Interaction:	TIES=HIGH is the default with the PERCENT option.
Tip:	You can use PERCENT to calculate cumulative percentages, but use GROUPS=100 to compute percentiles.

SAVAGE

computes Savage (or exponential) scores from the ranks by the following formula (Lehman 1975):

[IMAGE]

TIES=HIGH | LOW | MEAN

specifies the rank for tied values.

HIGH: assigns the largest of the corresponding ranks.
LOW: assigns the smallest of the corresponding ranks.
MEAN: assigns the mean of the corresponding ranks.

Default:	MEAN (unless the FRACTION or PERCENT option is in effect)
Featured in:	Ranking Values of Multiple Variables and Ranking Values within BY Groups

Chapter Contents
Previous
Next
Top of Page

Tip:	Use DESCENDING to reverse the order of the group values.
Featured in:	Partitioning Observations into Groups Based on Ranks