Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
The LOGISTIC Procedure

Iterative Algorithms for Model-Fitting

Two iterative maximum likelihood algorithms are available in PROC LOGISTIC. The default is the Fisher-scoring method, which is equivalent to fitting by iteratively reweighted least squares. The alternative algorithm is the Newton-Raphson method. Both algorithms give the same parameter estimates; however, the estimated covariance matrix of the parameter estimators may differ slightly. This is due to the fact that the Fisher-scoring method is based on the expected information matrix while the Newton-Raphson method is based on the observed information matrix. In the case of a binary logit model, the observed and expected information matrices are identical, resulting in identical estimated covariance matrices for both algorithms. You can use the TECHNIQUE= option to select a fitting algorithm.

Iteratively Reweighted Least-Squares Algorithm

Consider the multinomial variable Zj = (Z1j, ... ,Z(k+1)j)' such that

Z_{ij}= & \{
 1 & {if } Y_j=i \ 0 & {otherwise}
 . \

With pij denoting the probability that the jth observation has response value i, the expected value of Zj is pj = (p1j, ... ,p(k+1)j)'. The covariance matrix of Zj is Vj, which is the covariance matrix of a multinomial random variable for one trial with parameter vector pj. Let {\gamma} be the vector of regression parameters; in other words, {\gamma}'=
(\alpha_1,  ...  , \alpha_k,
{\beta}'). And let Dj be the matrix of partial derivatives of pj with respect to {\gamma}.The estimating equation for the regression parameters is

\sum_j{D}'_j{W}_j(Z_j-p_j)=0

where Wj = wj fj Vj-, wj and fj are the WEIGHT and FREQ values of the jth observation, and Vj- is a generalized inverse of Vj. PROC LOGISTIC chooses Vj- as the inverse of the diagonal matrix with pj as the diagonal.

With a starting value of {\gamma}_0, the maximum likelihood estimate of {\gamma} is obtained iteratively as

{\gamma}_{m+1}={\gamma}_m+(\sum_j
 D'_j{W_j}D_j)^{-1}
 \sum_j{D}'_j{W}_j(Z_j-p_j)
where Dj, Wj, and pj are evaluated at {\gamma}_m. The expression after the plus sign is the step size. If the likelihood evaluated at {\gamma}_{m+1} is less than that evaluated at {\gamma}_m,then {\gamma}_{m+1} is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained, that is, until {\gamma}_{m+1} is sufficiently close to {\gamma}_m. Then the maximum likelihood estimate of {\gamma} is \hat{{\gamma}}={\gamma}_{m+1}.

The covariance matrix of \hat{{\gamma}} is estimated by

\hat{cov}(\hat{{\gamma}})=(\sum_j
 \hat{D}'_j\hat{W}_j\hat{D}_j)^{-1}
where \hat{D}_j and \hat{W}_j are, respectively, Dj and Wj evaluated at \hat{{\gamma}}.

By default, starting values are zero for the slope parameters, and for the intercept parameters, starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response). Alternatively, the starting values may be specified with the INEST= option.

Newton-Raphson Algorithm

With parameter vector {\gamma}'=(\alpha_1,  ... ,\alpha_{k}, {\beta}'),the gradient vector and the Hessian matrix are given, respectively, by
g_{{\gamma}} & = & \sum_j w_jf_j\frac{\partial l_j}{\partial {\gamma}} \ H_{{\gamma}} & = & \sum_j - w_jf_j\frac{\partial^2 l_j}{\partial {\gamma}^2}

With a starting value of {\gamma}_0, the maximum likelihood estimate \hat{{\gamma}} of {\gamma} is obtained iteratively until convergence is obtained:

{\gamma}_{m+1} = {\gamma}_{m} +
 H_{{\gamma}_m}^{-1}g_{{\gamma}_m}
If the likelihood evaluated at {\gamma}_{m+1} is less than that evaluated at {\gamma}_m,then {\gamma}_{m+1} is recomputed by step-halving or ridging.

The covariance matrix of \hat{{\gamma}} is estimated by

\hat{cov}(\hat{{\gamma}})=H_{\hat{{\gamma}}}^{-1}

Chapter Contents
Chapter Contents
Previous
Previous
Next
Next
Top
Top

Copyright © 1999 by SAS Institute Inc., Cary, NC, USA. All rights reserved.