Iterative Algorithms for Model-Fitting

The LOGISTIC Procedure

Iterative Algorithms for Model-Fitting

Two iterative maximum likelihood algorithms are available in PROC LOGISTIC. The default is the Fisher-scoring method, which is equivalent to fitting by iteratively reweighted least squares. The alternative algorithm is the Newton-Raphson method. Both algorithms give the same parameter estimates; however, the estimated covariance matrix of the parameter estimators may differ slightly. This is due to the fact that the Fisher-scoring method is based on the expected information matrix while the Newton-Raphson method is based on the observed information matrix. In the case of a binary logit model, the observed and expected information matrices are identical, resulting in identical estimated covariance matrices for both algorithms. You can use the TECHNIQUE= option to select a fitting algorithm.

Iteratively Reweighted Least-Squares Algorithm

Consider the multinomial variable Z_j = (Z_1j, ... ,Z_(k+1)j)' such that

$Z_{ij}= & \{ 1 & {if } Y_j=i \ 0 & {otherwise} . \$

With p_ij denoting the probability that the jth observation has response value i, the expected value of Z_j is p_j = (p_1j, ... ,p_(k+1)j)'. The covariance matrix of Z_j is V_j, which is the covariance matrix of a multinomial random variable for one trial with parameter vector p_j. Let ${\gamma}$ be the vector of regression parameters; in other words, ${\gamma}'= (\alpha_1, ... , \alpha_k, {\beta}')$ . And let D_j be the matrix of partial derivatives of p_j with respect to ${\gamma}$ .The estimating equation for the regression parameters is

$\sum_j{D}'_j{W}_j(Z_j-p_j)=0$

where W_j = w_j f_j V_j^-, w_j and f_j are the WEIGHT and FREQ values of the jth observation, and V_j^- is a generalized inverse of V_j. PROC LOGISTIC chooses V_j^- as the inverse of the diagonal matrix with p_j as the diagonal.

With a starting value of ${\gamma}_0$ , the maximum likelihood estimate of ${\gamma}$ is obtained iteratively as

${\gamma}_{m+1}={\gamma}_m+(\sum_j D'_j{W_j}D_j)^{-1} \sum_j{D}'_j{W}_j(Z_j-p_j)$

where D_j, W_j, and p_j are evaluated at ${\gamma}_m$ . The expression after the plus sign is the step size. If the likelihood evaluated at ${\gamma}_{m+1}$ is less than that evaluated at ${\gamma}_m$ ,then ${\gamma}_{m+1}$ is recomputed by step-halving or ridging. The iterative scheme continues until convergence is obtained, that is, until ${\gamma}_{m+1}$ is sufficiently close to ${\gamma}_m$ . Then the maximum likelihood estimate of ${\gamma}$ is $\hat{{\gamma}}={\gamma}_{m+1}$ .

The covariance matrix of $\hat{{\gamma}}$ is estimated by

$\hat{cov}(\hat{{\gamma}})=(\sum_j \hat{D}'_j\hat{W}_j\hat{D}_j)^{-1}$

where $\hat{D}_j$ and $\hat{W}_j$ are, respectively, D_j and W_j evaluated at $\hat{{\gamma}}$ .

By default, starting values are zero for the slope parameters, and for the intercept parameters, starting values are the observed cumulative logits (that is, logits of the observed cumulative proportions of response). Alternatively, the starting values may be specified with the INEST= option.

Newton-Raphson Algorithm

With parameter vector ${\gamma}'=(\alpha_1, ... ,\alpha_{k}, {\beta}')$ ,the gradient vector and the Hessian matrix are given, respectively, by

$g_{{\gamma}} & = & \sum_j w_jf_j\frac{\partial l_j}{\partial {\gamma}} \ H_{{\gamma}} & = & \sum_j - w_jf_j\frac{\partial^2 l_j}{\partial {\gamma}^2}$

With a starting value of ${\gamma}_0$ , the maximum likelihood estimate $\hat{{\gamma}}$ of ${\gamma}$ is obtained iteratively until convergence is obtained:

${\gamma}_{m+1} = {\gamma}_{m} + H_{{\gamma}_m}^{-1}g_{{\gamma}_m}$

If the likelihood evaluated at ${\gamma}_{m+1}$ is less than that evaluated at ${\gamma}_m$ ,then ${\gamma}_{m+1}$ is recomputed by step-halving or ridging.

The covariance matrix of $\hat{{\gamma}}$ is estimated by

$\hat{cov}(\hat{{\gamma}})=H_{\hat{{\gamma}}}^{-1}$

Chapter Contents
Previous
Next
Top