Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LIFETEST Procedure

Computational Formulas

Product-Limit Method

Let t1 < t2 < ... < tk represent the distinct event times. For each i = 1, ... ,k, let ni be the number of surviving units, the size of the risk set, just prior to ti. Let di be the number of units that fail at ti, and let si=ni-di.

The product-limit estimate of the SDF at ti is the cumulative product

\hat{S}(t_i) = \prod_{j=1}^i 
( 1 - \frac{d_j}{n_j} )
Notice that the estimator is defined to be right continuous; that is, the events at ti are included in the estimate of S(ti). The corresponding estimate of the standard error is computed using Greenwood's formula (Kalbfleish and Prentice 1980) as
\hat{\sigma} ( \hat{S}(t_i) ) = 
\hat{S}(t_i) \sqrt{ \sum_{j=1}^i \frac{d_j}{n_j s_j} }
The first sample quartile of the survival time distribution is given by
q_{0.25} = \frac{1}2(\inf \{ t: 1 - \hat{S}(t) \geq 0.25 \} 
 + \sup \{ t: 1 - \hat{S}(t) \leq 0.25 \} )
Confidence intervals for the quartiles are based on the sign test (Brookmeyer and Crowley 1982). The 100(1-\alpha)\% confidence interval for the first quartile is given by
I_{0.25} =  \{t: (1 - \hat{S}(t) - 0.25)^2 \leq c_{\alpha} 
 \hat{\sigma}^2 ( \hat{S}(t) )  \}
where c_{\alpha} is the upper \alpha percentile of a central chi-squared distribution with 1 degree of freedom. The second and third sample quartiles and the corresponding confidence intervals are calculated by replacing the 0.25 in the last two equations by 0.50 and 0.75, respectively.

The estimated mean survival time is

\hat{\mu} = \sum_{i=1}^k \hat{S}(t_{i-1})(t_i - t_{i-1})
where t0 is defined to be zero. If the last observation is censored, this sum underestimates the mean. The standard error of \hat{\mu} is estimated as
\hat{\sigma}(\hat{\mu}) = \sqrt{\frac{m}{m-1} 
\sum_{i=1}^{k-1} \frac{A_i^2}{n_i s_i} }
where
A_i & = & \sum_{j=i}^{k-1} 
 \hat{S}(t_j)(t_{j+1} - t_j) \  
m & = & \sum_{j=1}^k d_j  \

Life Table Method

The life table estimates are computed by counting the numbers of censored and uncensored observations that fall into each of the time intervals [ti-1,ti), i = 1,2, ... ,k+1, where t0=0 and t_{k+1}=\infty.Let ni be the number of units entering the interval [ti-1,ti), and let di be the number of events occurring in the interval. Let bi=ti-ti-1, and let ni' = ni - wi/2, where wi is the number of units censored in the interval. The effective sample size of the interval [ti-1,ti) is denoted by ni'. Let tmi denote the midpoint of [ti-1,ti).

The conditional probability of an event in [ti-1,ti) is estimated by

\hat{q}_i = \frac{d_i}{n_i^'}
and its estimated standard error is
\hat{\sigma} ( \hat{q}_i ) = 
\sqrt{ \frac{ \hat{q}_i \hat{p}_i }{ n_i^' } }
where \hat{p}_i = 1 - \hat{q}_i.

The estimate of the survival function at ti is

\hat{S}(t_i) =
 \{
 1 & i=0 \ \hat{S}(t_{i-1})p_{i-1} & i\gt
 .
and its estimated standard error is
\hat{\sigma} ( \hat{S}(t_i) ) = 
\hat{S}(t_i) \sqrt{ \sum_{j=1}^{i-1} 
\frac{ \hat{q}_j }{ n_j^' \hat{p}_j } }
The density function at tmi is estimated by
\hat{f}(t_{mi}) = \frac{ \hat{S}(t_{i}) \hat{q}_i }{b_i}
and its estimated standard error is
\hat{\sigma} ( \hat{f}(t_{mi}) ) = 
\hat{f}(t_{mi}) \sqrt{ \sum_{j=1}^{i-1} 
\frac{ \hat{q}_j }{ n_j^' \hat{p}_j } + 
\frac{ \hat{p}_i }{ n_i^' \hat{q}_i } }
The estimated hazard function at tmi is
\hat{h}(t_{mi}) = \frac{ 2 \hat{q}_i }{ b_i(1 + \hat{p}_i) }
and its estimated standard error is
\hat{\sigma} ( \hat{h}(t_{mi}) ) = \hat{h}(t_{mi}) 
\sqrt{ \frac{ 1 - ( b_i \hat{h}(t_{mi})/2 )^2 }
 { n_i^' \hat{q}_i } 
 }
Let [tj-1,tj) be the interval in which \hat{S}(t_{j-1}) \geq \hat{S}(t_i)/2 \gt \hat{S}(t_j).The median residual lifetime at ti is estimated by
\hat{M}_i = t_{j-1} - t_i + b_j 
 \frac{ \hat{S}(t_{j-1}) - \hat{S}(t_i)/2}
 { \hat{S}(t_{j-1}) - \hat{S}(t_j) }
and the corresponding standard error is estimated by
\hat{\sigma}(\hat{M}_i) = 
\frac{ \hat{S}(t_i) }
 { 2 \hat{f}(t_{mj}) \sqrt{n_i^'} }

Interval Determination

If you want to determine the intervals exactly, use the INTERVALS= option in the PROC LIFETEST statement to specify the interval endpoints. Use the WIDTH= option to specify the width of the intervals, thus indirectly determining the number of intervals. If neither the INTERVALS= option nor the WIDTH= option is specified in the life table estimation, the number of intervals is determined by the NINTERVAL= option. The width of the time intervals is 2, 5, or 10 times an integer (possibly a negative integer) power of 10. Let c = log10(maximum event or censored time/number of intervals), and let b be the largest integer not exceeding c. Let d=10c-b and let
a = 2 x I(d \leq 2) + 
 5 x I(2 \lt d \leq 5) + 
 10 x I(d \gt 5)
with I being the indicator function. The width is then given by
width = a ×10b
By default, NINTERVAL=10.

Confidence Limits Added to the Output Data Set

The upper confidence limits (UCL) and the lower confidence limits (LCL) for the distribution estimates for both the product-limit and life table methods are computed as

{UCL} & = & \hat{\lambda} + z_{\alpha / 2} \hat{\sigma} \ 
{LCL} & = & \hat{\lambda} - z_{\alpha / 2} \hat{\sigma} \

where \hat{\lambda} is the estimate (either the survival function, the density, or the hazard function), \hat{\sigma} is the corresponding estimate of the standard error, and z_{\alpha / 2} is the critical value for the normal distribution. That is, \Phi (-z_{\alpha / 2}) = \alpha / 2, where \Phi is the cumulative distribution function for the standard normal distribution.

The value of \alpha can be specified with the ALPHA= option.

Tests for Equality of Survival Curves across Strata

Log-Rank Test and Wilcoxon Test

The rank statistics used to test homogeneity between the strata (Kalbfleish and Prentice 1980) have the form of a c ×1 vector v = (v1,v2, ... ,vc)' with
v_j = \sum_{i=1}^k w_i ( d_{ij} - \frac{n_{ij} d_i}{n_i} )
where c is the number of strata, and the estimated covariance matrix, V = (Vjl), is given by
V_{jl} = \sum_{i=1}^k 
\frac{ w_i^2 d_i s_i (n_i n_{il} \delta_{jl} - n_{ij} n_{il}) }
 { n_i^2 (n_i - 1) }
where i labels the distinct event times, \delta_{jl} is 1 if j=l and 0 otherwise, nij is the size of the risk set in the jth stratum at the ith event time, dij is the number of events in the jth stratum at the ith time, and

n_i & = & \sum_{j=1}^c n_{ij} \ 
d_i & = & \sum_{j=1}^c d_{ij} \ 
s_i & = & n_i - d_i

The term vj can be interpreted as a weighted sum of observed minus expected numbers of failure under the null hypothesis of identical survival curves. The weight wi is 1 for the log-rank test and ni for the Wilcoxon test. The overall test statistic for homogeneity is v'V-v, where V- denotes a generalized inverse of V. This statistic is treated as having a chi-square distribution with degrees of freedom equal to the rank of V for the purposes of computing an approximate probability level.

Likelihood Ratio Test

The likelihood ratio test statistic (Lawless 1982) for homogeneity assumes that the data in the various strata are exponentially distributed and tests that the scale parameters are equal. The test statistic is computed as
Z = 2N \log ( \frac{T}N ) - 
 2 \sum_{j=1}^c N_j \log ( \frac{T_j}{N_j} )
where Nj is the total number of events in the jth stratum, N = \sum_{j=1}^c N_j, Tj is the total time on test in the jth stratum, and T = \sum_{j=1}^c T_j. The approximate probability value is computed by treating Z as having a chi-square distribution with c - 1 degrees of freedom.

Rank Tests for the Association of Survival Time with Covariates

The rank tests for the association of covariates are more general cases of the rank tests for homogeneity. A good discussion of these tests can be found in Kalbfleisch and Prentice (1980). In this section, the index \alpha is used to label all observations, \alpha=1,2, ... ,n, and the indices i,j range only over the observations that correspond to events, i,j = 1,2, ... ,k. The ordered event times are denoted as t(i), the corresponding vectors of covariates are denoted as z(i), and the ordered times, both censored and event times, are denoted as t_{\alpha}.

The rank test statistics have the form

v = \sum_{\alpha=1}^n c_{\alpha,\delta_{\alpha}} 
 z_{\alpha}
where n is the total number of observations, c_{\alpha,\delta_{\alpha}} are rank scores, which can be either log-rank or Wilcoxon rank scores, \delta_{\alpha} is 1 if the observation is an event and 0 if the observation is censored, and z_{\alpha} is the vector of covariates in the TEST statement for the \alphath observation. Notice that the scores, c_{\alpha,\delta_{\alpha}}, depend on the censoring pattern and that the summation is over all observations.

The log-rank scores are

c_{\alpha,\delta_{\alpha}} = \sum_{(j:t_{(j)} \leq t_{\alpha})}
( \frac{1}{n_j} - \delta_{\alpha} )
and the Wilcoxon scores are
c_{\alpha,\delta_{\alpha}} = 1 - (1 + \delta_{\alpha})
\prod_{(j:t_{(j)} \leq t_{\alpha})} \frac{n_j}{n_j + 1}
where nj is the number at risk just prior to t(j).

The estimates used for the covariance matrix of the log-rank statistics are

V = \sum_{i=1}^k \frac{V_i}{n_i}
where Vi is the corrected sum of squares and crossproducts matrix for the risk set at time t(i); that is,
V_i = \sum_{(\alpha:t_{\alpha} \geq t_{(i)} ) }
(z_{\alpha} - {\bar{z}}_i)^'
(z_{\alpha} - {\bar{z}}_i)
where
{\bar{z}}_i = \sum_{(\alpha:t_{\alpha} \geq t_{(i)} ) } 
\frac{z_{\alpha}}{n_i}
The estimate used for the covariance matrix of the Wilcoxon statistics is
V = \sum_{i=1}^k 
 [ a_i (1 - a_i^*)
 (2{z}_{(i)}z_{(i)}^' + S_i) - 
 (a_i^* - a_i) 
 ( a_i x_i{x}_i^' + 
 \sum_{j=i+1}^k a_j (x_i{x}_j^' + 
 x_j{x}_i^')
 )
 ]
where

a_i & = & \prod_{j=1}^i \frac{n_j}{n_j + 1} \ 
a_i^* & = & \prod_{j=1}^i \frac{n...
 ... z_{(i)} + 
 \sum_{(\alpha:t_{(i+1)} \gt t_{\alpha} \gt t_{(i)})} 
 z_{\alpha} \

In the case of tied failure times, the statistics v are averaged over the possible orderings of the tied failure times. The covariance matrices are also averaged over the tied failure times. Averaging the covariance matrices over the tied orderings produces functions with appropriate symmetries for the tied observations; however, the actual variances of the v statistics would be smaller than the preceding estimates. Unless the proportion of ties is large, it is unlikely that this will be a problem.

The univariate tests for each covariate are formed from each component of v and the corresponding diagonal element of V as vi2/Vii. These statistics are treated as coming from a chi-square distribution for calculation of probability values.

The statistic v'V-v is computed by sweeping each pivot of the V matrix in the order of greatest increase to the statistic. The corresponding sequence of partial statistics is tabulated. Sequential increments for including a given covariate and the corresponding probabilities are also included in the same table. These probabilities are calculated as the tail probabilities of a chi-square distribution with one degree of freedom. Because of the selection process, these probabilities should not be interpreted as p-values.

If desired for data screening purposes, the output data set requested by the OUTTEST= option can be treated as a sum of squares and crossproducts matrix and processed by the REG procedure using the option METHOD=RSQUARE. Then the sets of variables of a given size can be found that give the largest test statistics. Example 37.1 illustrates this process.

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.