MODEL Statement

The PHREG Procedure

MODEL Statement

MODEL response < *censor ( list ) > = variables < /options > ;

MODEL (t1, t2) < *censor(list) > = variables < /options > ;

The MODEL statement identifies the variables to be used as the failure time variables, the optional censoring variable, and the explanatory variables. Two forms of MODEL syntax can be specified; the first form allows one response variable, while the second form allows two variables for the counting process style of input (see the section "Counting Process Style of Input" for more information).

In the first MODEL statement, preceding the equal sign, is the name of the failure time variable. This can optionally be followed by an asterisk, the name of the censoring variable, and a list of censoring values (separated by blanks or commas if there is more than one) enclosed in parentheses. If the censoring variable takes on one of these values, the corresponding failure time is considered to be censored. The variables following the equal sign are the explanatory variables (sometimes called independent variables or covariates) for the model.

Instead of a single failure time variable, the second MODEL statement identifies a pair of failure time variables. Their names are enclosed in parentheses, and they signify the endpoints of a semi-closed interval (t1,t2] during which the subject is at risk. If the censoring variable takes on one of the censoring values, the time t2 is considered to be censored.

The censoring variable and the explanatory variables must be numeric. The failure time variables must contain nonnegative values. Any observation with a negative failure time is excluded from the analysis, as is any observation with a missing value for any of the variables listed in the MODEL statement. You can specify the following options in the MODEL statement.

Ties-Handling Option

TIES=method

specifies how to handle ties in the failure time. The TIES= option can take the following values:

BRESLOW

uses the approximate likelihood of Breslow (1974). This is the default value.

DISCRETE

replaces the proportional hazards model by the discrete logistic model

$\frac{h(t;z)}{1-h(t;z)} =\frac{h_{0}(t)}{1-h_{0}(t)} {\rm exp}({z'}{{\beta}})$

where h₀(t) and h(t;z) are discrete hazard functions.

EFRON

uses the approximate likelihood of Efron (1977).

EXACT

computes the exact conditional probability under the proportional hazards assumption that all tied event times occur before censored times of the same value or before larger values. This is equivalent to summing all terms of the marginal likelihood for ${\beta}$ that are consistent with the observed data (Kalbfleisch and Prentice 1980; DeLong, Guirguis, and So 1994).

The EXACT method may take a considerable amount of computer resources. If ties are not extensive, the EFRON and BRESLOW methods provide satisfactory approximations to the EXACT method for the continuous time-scale model. In general, Efron's approximation gives results that are much closer to the EXACT method results than Breslow's approximation does. If the time scale is genuinely discrete, you should use the DISCRETE method. The DISCRETE method is also required in the analysis of case-control studies when there is more than one case in a matched set. If there are no ties, all four methods result in the same likelihood and yield identical estimates. The default, TIES=BRESLOW, is the most efficient method when there are no ties.

Model-Specification Options

ENTRYTIME=variable

ENTRY=variable

specifies the name of the variable that represents the left truncation time. This option has no effect when the counting process style of input is specified. See the section "Left Truncation of Failure Times" for more information.

NOFIT

performs the global score test, which tests the joint significance of all the explanatory variables in the MODEL statement. No parameters are estimated. If the NOFIT option is specified along with other MODEL statement options, NOFIT takes precedence, and all other options are ignored except the TIES= option.

OFFSET=name

specifies the name of an offset variable, which is an explanatory variable with a regression coefficient fixed as one. This option can be used to incorporate risk weights for the likelihood function.

SELECTION=method

specifies the method used to select the model. The methods available are

BACKWARD | B: requests backward elimination.
FORWARD | F: requests forward selection.
NONE | N: fits the complete model specified in the MODEL statement. This is the default value.
SCORE: requests best subset selection. It identifies a specified number of models with the highest score chi-squared statistic for all possible model sizes ranging from one explanatory variable to the total number of explanatory variables listed in the MODEL statement.
STEPWISE | S: requests stepwise selection.

For more information, see the section "Variable Selection Methods".

Model-Building Options

The following options enable you to provide additional specifications for the BACKWARD, FORWARD, SCORE, and STEPWISE model selection methods. They have no effect when SELECTION=NONE. Only the INCLUDE=, START=, STOP=, and BEST= options work with the SCORE method.

BEST=n: is used exclusively with the SCORE model selection method. The BEST=n option specifies that n models with the highest score chi-squared statistics are to be displayed for each model size. If the option is omitted and there are no more than ten explanatory variables, then all possible models are listed for each model size. If the option is omitted and there are more than ten explanatory variables, then the number of models selected for each model size is, at most, equal to the number of explanatory variables listed in the MODEL statement.

See Example 49.2 for an illustration of the SCORE selection method and the BEST= option.
DETAILS: produces a detailed display at each step of the model-building process. It produces an "Analysis of Variables Not in the Model" table before displaying the variable selected for entry for FORWARD or STEPWISE selection. For each model fitted, it produces the "Analysis of Maximum Likelihood Estimates" table. See Example 49.1 for a discussion of these tables.
INCLUDE=n: includes the first n explanatory variables listed in the MODEL statement in every model. The value for n ranges from 1 to s, where s is the number of explanatory variables in the MODEL statement. The default value of n is 0.
MAXSTEP=n: specifies the maximum number of times the explanatory variables can move in and out of the model before the STEPWISE model-building process ends. The default value for n is twice the number of explanatory variables in the MODEL statement. The option has no effect for other model selection methods.
SEQUENTIAL: forces variables to be added to the model in the order specified in the MODEL statement or to be eliminated from the model in the reverse order specified in the MODEL statement.
SLENTRY=value
SLE=value: specifies the significance level (a value between 0 and 1) for entering an explanatory variable into the model in the FORWARD or STEPWISE method. For all variables not in the model, the one with the smallest p-value is entered if the p-value is less than or equal to the specified significance level. The default value is 0.05.
SLSTAY=value
SLS=value: specifies the significance level (a value between 0 and 1) for removing an explanatory variable from the model in the BACKWARD or STEPWISE method. For all variables in the model, the one with the largest p-value is removed if the p-value exceeds the specified significance level. The default value is 0.05.
START=n: begins the FORWARD, BACKWARD, or STEPWISE model selection process with the first n explanatory variables listed in the MODEL statement. The value for n ranges from 0 to s, where s is the total number of explanatory variables in the MODEL statement. The default value of n is s for the BACKWARD method and 0 for the FORWARD and STEPWISE methods. Note that START=n specifies only that the first n explanatory variables appear in the first model, while INCLUDE=n specifies that the first n explanatory variables be included in every model. For the SCORE method, START=n specifies that the smallest models contain n explanatory variables, where n ranges from 1 to s. The default value of n is 1.
STOP=n: specifies the maximum (FORWARD method) or minimum (BACKWARD method) number of explanatory variables to be included in the final model. The value for n ranges from 0 to s, where s is the number of explanatory variables in the MODEL statement. The default value of n is 0 for the BACKWARD method and s for the FORWARD method. For the SCORE method, STOP=n specifies that the largest models contain n explanatory variables, where n ranges from 1 to s. The default value of n is s. The STOP= option has no effect for the STEPWISE method.
STOPRES
SR: specifies that the addition and deletion of variables are to be based on the result of the likelihood score test for testing the joint significance of variables not in the model. This score chi-squared statistic is referred to as the residual chi-square. In the FORWARD method, the STOPRES option enters the explanatory variables into the model one at a time until the residual chi-square becomes insignificant (that is, until the p-value of the residual chi-square exceeds the SLENTRY= value). In the BACKWARD method, the STOPRES option removes variables from the model one at a time until the residual chi-square becomes significant (that is, until the p-value of the residual chi-square becomes less than the SLSTAY= value). The STOPRES option has no effect for the STEPWISE method.

Optimization Options

Four convergence criteria are allowed: ABSFCONV=, FCONV=, GCONV=, and XCONV=. If you specify more than one convergence criterion, the optimization is terminated as soon as one of the criteria is satisfied. If none of the criteria is specified, the default is GCONV=1E-8.

ABSFCONV=value

specifies the absolute function convergence criterion. Termination requires a small change in the objective function (log partial likelihood function) in subsequent iterations,

$| l_k - l_{k-1}| \lt {value}$

where l_k is the value of the objective function at iteration k.

CONVERGELIKE=value

is the same as specifying the ABSFCONV= option.

CONVERGEPARM=value

is the same as specifying the XCONV= option.

FCONV=value

specifies the relative function convergence criterion. Termination requires a small relative change in the objective function (log partial likelihood function) in subsequent iterations,

$\frac{| l_k - l_{k-1}|} {| l_{k-1}| + 1{\rm E}-6} \lt {value}$

where l_k is the value of the objective function at iteration k.

GCONV=value

specifies the relative gradient convergence criterion. Termination requires that the normalized prediction function reduction is small,

$\frac{g_k H^{-1}_k g_k} {| l_k| + 1{\rm E}-6} \lt {value}$

where l_k is the log partial likelihood, g_k is the gradient vector (first partial derivatives of the log partial likelihood), and H_k is the negative Hessian matrix (second partial derivatives of the log partial likelihood), all at iteration k.

MAXITER=n

specifies the maximum number of iterations allowed. The default value for n is 25. If convergence is not attained in n iterations, the displayed output and all data sets created by PROC PHREG contain results that are based on the last maximum likelihood iteration.

RIDGING=ABSOLUTE | RELATIVE | NONE

specifies the technique to improve the log-likelihood when its value is worse than that of the previous step. For RIDGING=ABSOLUTE, the diagonal elements of the negative (expected) Hessian are inflated by adding the ridge value. For RIDGING=RELATIVE, the diagonal elements are inflated by the factor equal to 1 plus the ridge value. For RIDGING=NONE, the crude line-search method of taking half a step is used instead of ridging.

SINGULAR=value

specifies the singularity criterion for determining linear dependencies in the set of explanatory variables. The default value is 10^-12.

XCONV=value

specifies the relative parameter convergence criterion. Termination requires a small relative parameter change in subsequent iterations,

$\max_i |\delta_k^{(i)}| \lt {value}$

where

$\delta_k^{(i)} = \{ \theta_k^{(i)} - \theta_{k-1}^{(i)} & |\theta_{k-1}^{(i)}|... ...{\theta_k^{(i)} - \theta_{k-1}^{(i)}}{\theta_{k-1}^{(i)} } & {\rm otherwise} .$

where $\theta_k^{(i)}$ is the estimate of the ith parameter at iteration k.

Display Options

ALPHA=value: sets the significance level used for the confidence limits for the hazards ratios. The value must be between 0 and 1. The default value is 0.05, which results in the calculation of a 95% confidence interval. This option has no effect unless the RISKLIMITS option is specified.
CORRB: displays the estimated correlation matrix of the parameter estimates.
COVB: displays the estimated covariance matrix of the parameter estimates.
ITPRINT: displays the iteration history, including the last evaluation of the gradient vector.
RISKLIMITS
RL: displays, for each explanatory variable, the $100(1-{\alpha})\%$ confidence limits for the hazards ratio ( $e^{\beta_{i}}$ ). The value for ${\alpha}$ is determined by the ALPHA= option.

Chapter Contents
Previous
Next
Top