Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
PROC CAPABILITY and General Statements

Percentile Computations

The CAPABILITY procedure automatically computes the 1st, 5th, 10th, 25th, 50th, 75th, 90th, 95th, and 99th percentiles (quantiles), as well as the minimum and maximum of each analysis variable. To compute percentiles other than these default percentiles, use the PCTLPTS= and PCTLPRE= options in the OUTPUT statement.

You can specify one of five definitions for computing the percentiles with the PCTLDEF= option. Let n be the number of nonmissing values for a variable, and let x1, x2, ... , xn represent the ordered values of the variable. Let the t th percentile be y, set p = [t/100], and let


		\(
 {np} &=& j+g & {when PCTLDEF=1, 2, 3, or 5} \ (n + 1) p &=& j+g & {when PCTLDEF=4}
 \)
where j is the integer part of np, and g is the fractional part of np. Then the PCTLDEF= option defines the t th percentile, y, as described in the following table:

PCTLDEF= Description Formula
1weighted average at xnpy = (1-g)xj+gxj+1
  where x0 is taken to be x1
2observation numbered closest to npy=x_i & {if } g \ne \frac{1}2 \ y=x_j & {if } g=\frac{1}2
 { and j is even} \ y=x_{j+1} & {if } g=\frac{1}2
 { and j is odd} \
  where i is the integer part of np+(1/2)
3empirical distribution functiony=x_{j} & {if } g=0 \ y=x_{j+1} & {if } g\gt
4weighted average aimedy=(1-g)xj + gxj+1
 at x(n + 1) pwhere xn + 1 is taken to be xn
5empirical distribution function with averagingy=\frac{1}2(x_j + x_{j+1}) & {if } g=0 \ y=x_{j+1} & {if } g\gt

Weighted Percentiles

When you use a WEIGHT statement, the percentiles are computed differently. The 100pth weighted percentile y is computed from the empirical distribution function with averaging
y = 
\{
\frac{1}2 ( x_i + x_{i+1} ) & {if} \sum_{j=1}^i w_j = pW \x_{i+1} & {if} \sum_{j=1}^i w_j
 \lt pW
 \lt \sum{j=1}^{i+1} w_j.
where wi is the weight associated with xi, and where W = \sum_{i=1}^n w_i is the sum of the weights.

Note that the PCTLDEF= option is not applicable when a WEIGHT statement is used. However, in this case, if all the weights are identical, the weighted percentiles are the same as the percentiles that would be computed without a WEIGHT statement and with PCTLDEF=5.

Confidence Limits for Percentiles

You can use the CIPCTLNORMAL option to request confidence limits for percentiles which assume the data are normally distributed. These limits are described in Section 4.4.1 of Hahn and Meeker (1991). When 0.0 < p < 0.5, the two-sided 100(1-\alpha)% confidence limits for the 100p-th percentile are
{lower limit} & = & \bar{X} - g'(\alpha/2;1-p,n) s \{upper limit} & = & \bar{X} - g'(1 - \alpha/2;p,n) s
where n is the sample size. When 0.5 \leq p \lt 1.0,the two-sided 100(1-\alpha)% confidence limits for the 100p-th percentile are
{lower limit} & = & \bar{X} + g'(\alpha/2;1-p,n) s \{upper limit} & = & \bar{X} + g'(1 - \alpha/2;p,n) s
One-sided 100(1-\alpha)% confidence bounds are computed by replacing \alpha/2 by \alphain the appropriate equation above. The factor g'(\gamma,p,n)is related to the noncentral t distribution and is described in Owen and Hua (1977) and Odeh and Owen (1980).

You can use the CIPCTLDF option to request confidence limits for percentiles which are distribution free (in particular, it is not necessary to assume that the data are normally distributed). These limits are described in Section 5.2 of Hahn and Meeker (1991). The two-sided 100(1-\alpha)% confidence limits for the 100p-th percentile are

{lower limit} & = & X_{(l)} \{upper limit} & = & X_{(u)}
where X(j) is the jth order statistic when the data values are arranged in increasing order:
X_{(1)} \leq X_{(2)} \leq  ...  \leq X_{(n)}
The lower rank l and upper rank u are integers that are symmetric (or nearly symmetric) around [np]+1 where [np] is the integer part of np, and where n is the sample size. Furthermore, l and u are chosen so that X(l) and X(u) are as close to X[n+1]p as possible while satisfying the coverage probability requirement
Q(u-1;n,p) - Q(l-1;n,p) \geq 1 - \alpha
where Q(k;n,p) is the cumulative binomial probability
Q(k;n,p) = \sum_{i=0}^k 
 ( n \ i )
 p^i (1-p)^{n-i}
In some cases, the coverage requirement cannot be met, particularly when n is small and p is near 0 or 1. To relax the requirement of symmetry, you can specify CIPCTLDF( TYPE = ASYMMETRIC ). This option requests symmetric limits when the coverage requirement can be met, and asymmetric limits otherwise.

If you specify CIPCTLDF( TYPE = LOWER ), a one-sided 100(1-\alpha)% lower confidence bound is computed as Xl, where l is the largest integer that satisfies the inequality

1 - Q(l-1;n,p) \geq 1 - \alpha
with 0 \lt l \leq n.Likewise, if you specify CIPCTLDF( TYPE = UPPER ), a one-sided 100(1-\alpha)% lower confidence bound is computed as Xl, where l is the largest integer that satisfies the inequality
Q(u-1;n,p) \geq 1 - \alpha
where 0 \lt u \leq n.

Note that confidence limits for percentiles are not computed when a WEIGHT statement is specified.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.