MODEL Statement

Chapter Contents

The PROBIT Procedure

MODEL Statement

<label:>

MODEL response=independents < / options > ;

<label:>

MODEL events/trials=independents < / options > ;

The MODEL statement names the variables used as the response and the independent variables. Additionally, you can specify the distribution used to model the response, as well as other options. More than one MODEL statement can be specified with the PROBIT procedure. The optional label is used to label output from the matching MODEL statement.

The response can be a single variable with a value that is used to indicate the level of the observed response. Such a response variable must be listed in the CLASS statement. For example, the response might be a variable called Symptoms that takes on the values `None,' `Mild,' or `Severe.' Note that, for dichotomous response variables, the probability of the lower sorted value is modeled by default (see the "Details" section). Because the model fit by the PROBIT procedure requires ordered response levels, you may need to use either the ORDER=DATA option in the PROC statement or a numeric coding of the response to get the desired ordering of levels.

Alternatively, the response can be specified as a pair of variable names separated by a slash (/). The value of the first variable, events, is the number of positive responses (or events). The value of the second variable, trials, is the number of trials. Both variables must be numeric and nonnegative, and the ratio of the first variable value to the second variable value must be between 0 and 1, inclusive. For example, the variables might be hits, a variable containing the number of hits for a baseball player, and AtBats, a variable containing the number of times at bat. A model for hitting proportion (batting average) as a function of age could be specified as

   model hits/AtBats=age;

If no independent variables are specified, PROC PROBIT fits an intercept-only model. The following options are available in the MODEL statement.

CONVERGE=value

specifies the convergence criterion. Convergence is declared when the maximum change in the parameter estimates between Newton-Raphson steps is less than the value specified. The change is a relative change if the parameter is greater than 0.01 in absolute value; otherwise, it is an absolute change.

By default, CONVERGE=0.001.

CORRB

displays the estimated correlation matrix of the parameter estimates.

COVB

displays the estimated covariance matrix of the parameter estimates.

DISTRIBUTION=distribution-type

DIST=distribution-type

D=distribution-type

specifies the cumulative distribution function used to model the response probabilities. The distributions are described in the "Details" section. Valid values for distribution-type are

NORMAL: the normal distribution for the probit model
LOGISTIC: the logistic distribution for the logit model
EXTREMEVALUE | EXTREME | GOMPERTZ: the extreme value, or Gompertz distribution for the gompit model

By default, DISTRIBUTION=NORMAL.

HPROB=value

specifies a minimum probability level for the Pearson chi-square to indicate a good fit. The default value is 0.10. The LACKFIT option must also be specified for this option to have any effect. For Pearson goodness of fit chi-square values with probability greater than the HPROB= value, the fiducial limits, if requested with the INVERSECL option, are computed using a critical value of 1.96. For chi-square values with probability less than the value of the HPROB= option, the critical value is a 0.95 two-sided quantile value taken from the t distribution with degrees of freedom equal to (k - 1) ×m - q, where k is the number of levels for the response variable, m is the number of different sets of independent variable values, and q is the number of parameters fit in the model. If you specify the HPROB= option in both the PROC and MODEL statements, the MODEL statement option takes precedence.

INITIAL=values

sets initial values for the parameters in the model other than the intercept. The values must be given in the order in which the variables are listed in the MODEL statement. If some of the independent variables listed in the MODEL statement are classification variables, then there must be as many values given for that variable as there are classification levels minus 1. The INITIAL option can be specified as follows.

Type of List		Specification
list separated by blanks		`initial=3 4 5`
list separated by commas		`initial=3,4,5`

By default, all parameters have initial estimates of zero.

INTERCEPT=value

initializes the intercept parameter to value. By default, INTERCEPT=0.

INVERSECL

computes confidence limits for the values of the first continuous independent variable (such as dose) that yield selected response rates. If the algorithm fails to converge (this can happen when C is nonzero), missing values are reported for the confidence limits. See the section "Inverse Confidence Limits" for details.

ITPRINT

displays the iteration history, the final evaluation of the gradient, and the second derivative matrix (Hessian).

LACKFIT

performs two goodness-of-fit tests (a Pearson chi-square test and a log-likelihood ratio chi-square test) for the fitted model.

Note: The data set must be sorted by the independent variables before the PROBIT procedure is run if you want to perform a test of fit. This test is not appropriate if the data are very sparse, with only a few values at each set of the independent variable values.

If the Pearson chi-square test statistic is significant, then the covariance estimates and standard error estimates are adjusted. See the "Lack of Fit Tests" section for a description of the tests. If you specify the LACKFIT option in both the PROC and MODEL statements, the MODEL statement option takes precedence.

MAXITER=value

specifies the maximum number of iterations to be performed in estimating the parameters. By default, MAXITER=50.

NOINT

fits a model with no intercept parameter. If the INTERCEPT= option is also specified, the intercept is fixed at the specified value; otherwise, it is set to zero. This is most useful when the response is binary. When the response has k levels, then k-1 intercept parameters are fit. The NOINT option sets the intercept parameter corresponding to the lowest response level equal to zero. A Lagrange multiplier, or score, test for the restricted model is computed when the NOINT option is specified.

SINGULAR=value

specifies the singularity criterion for determining linear dependencies in the set of independent variables. The sum of squares and crossproducts matrix of the independent variables is formed and swept. If the relative size of a pivot becomes less than the value specified, then the variable corresponding to the pivot is considered to be linearly dependent on the previous set of variables considered. By default, SINGULAR=1E-12.

Chapter Contents
Previous
Next
Top