Computational Methods

The NLIN Procedure

Computational Methods

For the system of equations represented by the nonlinear model

$Y = F(\beta_0,\beta_1, ... ,\beta_r, Z_1,Z_2, ... ,Z_n) + {\epsilon} = F({\beta}^*) + {\epsilon}$

where Z is a matrix of the independent variables, ${\beta}^*$ is a vector of the parameters, ${\epsilon}$ is the error vector, and F is a function of the independent variables and the parameters, there are two approaches to solving for the minimum. The first method is to minimize

$L({\beta}) = 0.5 (e^'e)$

where $e = Y - F({\beta})$ and ${\beta}$ is an estimate of ${\beta}^*$ .

The second method is to solve the nonlinear "normal" equations

$X^'F({\beta}) = X^'Y$

where

$X = \frac{\partial{F}}{\partial {\beta}}$

In the nonlinear situation, both X and $F({\beta})$ are functions of ${\beta}$ and a closed-form solution generally does not exist. Thus, PROC NLIN uses an iterative process: a starting value for ${\beta}$ is chosen and continually improved until the error sum of squares ${\epsilon}^'{\epsilon}$ is minimized.

The iterative techniques that PROC NLIN uses are similar to a series of linear regressions involving the matrix X evaluated for the current values of ${\beta}$ and $e = Y - F({\beta})$ , the residuals evaluated for the current values of ${\beta}$ .

The iterative process begins at some point ${\beta}_0$ .Then X and Y are used to compute a ${\Delta}$ such that

${SSE}({\beta}_0 + k{\Delta}) \lt {SSE}({\beta}_0)$

The four methods differ in how ${\Delta}$ is computed to change the vector of parameters.

${Steepest descent: } {\Delta} & = & {X^'e} \ {Gauss-Newton: } {\Delta} & = & ... ...e} \ {Marquardt: } {\Delta} & = & (X^'X + {\lambda} {diag}(X^'X))^{-}{X^'e} \$

The default method used to compute (X'X)^- is the sweep operator producing a reflexive generalized (g₂) inverse. In some cases it would be preferable to use a Moore-Penrose (g₄) inverse. If the G4 option is specified in the PROC NLIN statement, a g₄ inverse is used to calculate ${\Delta}$ on each iteration.

The Gauss-Newton and Marquardt iterative methods regress the residuals onto the partial derivatives of the model with respect to the parameters until the estimates converge. The Newton iterative method regresses the residuals onto a function of the first and second derivatives of the model with respect to the parameters until the estimates converge. Analytical first- and second-order derivatives are automatically computed.

Steepest Descent (Gradient)

The steepest descent method is based on the gradient of ${\epsilon}^'{\epsilon}$ :

$\frac{1}2 \frac{\partial L({\beta})}{\partial {\beta}} = -X{Y} + X{F}({\beta}) = -{X^'e}$

The quantity -X'e is the gradient along which ${\epsilon^'\epsilon}$ increases. Thus ${\Delta}={X^'e}$ is the direction of steepest descent.

If the automatic variables _WEIGHT_ and _RESID_ are used, then

${\Delta} = {X^' W^{SSE} r}$

is the direction, where

W^SSE: is an n ×n diagonal matrix with elements w_i^SSE of weights from the _WEIGHT_ variable. Each element w_i^SSE contains the value of _WEIGHT_ for the ith observation.
r: is a vector with elements r_i from _RESID_. Each element r_i contains the value of _RESID_ evaluated for the ith observation.

Using the method of steepest descent, let

${\beta}_{k+1} = {\beta}_k + \alpha {\Delta}$

where the scalar $\alpha$ is chosen such that

${SSE}({\beta}_i + \alpha {\Delta}) \lt {SSE}({\beta}_i)$

Note: The steepest descent method may converge very slowly and is therefore not generally recommended. It is sometimes useful when the initial values are poor.

Newton

The Newton method uses the second derivatives and solves the equation

${\Delta} = {G^{-} X^'e}$

where

${G } = ({X^'X}) + \sum_{i=1}^n H_i({\beta}) e_i$

and $H_i({\beta})$ is the Hessian of e:

$[H_i] _{jk} = [ \frac{ \partial^2 e_i } { \partial {\beta}_j \partial {\beta}_k } ]_{jk}$

If the automatic variables _WEIGHT_, _WGTJPJ_, and _RESID_ are used, then

${\Delta} = G^{-}{X^' W^{SSE} r}$

is the direction, where

$G = {X^' W^{XPX} X} + \sum_{i=1}^n H_i ({\beta}) w_i^{XPX} r_i$

and

W^SSE: is an n ×n diagonal matrix with elements w_i^SSE of weights from the _WEIGHT_ variable. Each element w_i^SSE contains the value of _WEIGHT_ for the ith observation.
W^XPX: is an n ×n diagonal matrix with elements w_i^XPX of weights from the _WGTJPJ_ variable.

Each element w_i^XPX contains the value of _WGTJPJ_ for the ith observation.
r: is a vector with elements r_i from the _RESID_ variable. Each element r_i contains the value of _RESID_ evaluated for the ith observation.

Gauss-Newton

The Gauss-Newton method uses the Taylor series

$F({\beta}) = F({\beta}_0) + X({\beta}- {\beta}_0) + ...$

where $X=\partial{F} / \partial {\beta}$ is evaluated at ${\beta}= {\beta}_0$ .

Substituting the first two terms of this series into the normal equations

$X^'F({\beta}) & = & X^'Y \ X^'(F({\beta}_0) + X({\beta}- {\beta}_0)) & = & X^... ...beta}- {\beta}_0) & = & X^'Y - X^'F({\beta}_0) \ (X^'X){\Delta} & = & X^'e \$

and therefore

${\Delta} = (X^'X)^{-}X^'e$

Caution: If X'X is singular or becomes singular, PROC NLIN computes ${\Delta}$ using a generalized inverse for the iterations after singularity occurs. If X'X is still singular for the last iteration, the solution should be examined.

Marquardt

The Marquardt updating formula is as follows:

${\Delta} = (X^'X + \lambda {diag}(X^'X))^{-}X^'e$

The Marquardt method is a compromise between the Gauss-Newton and steepest descent methods (Marquardt 1963). As $\lambda arrow 0$ , the direction approaches Gauss-Newton. As $\lambda arrow \infty$ , the direction approaches steepest descent.

Marquardt's studies indicate that the average angle between Gauss-Newton and steepest descent directions is about $90^{\circ}$ . A choice of $\lambda$ between 0 and infinity produces a compromise direction.

By default, PROC NLIN chooses $\lambda=10^{-3}$ to start and computes a ${\Delta}$ . If SSE $({\beta}_0+{\Delta}) \lt {SSE}({\beta}_0)$ , then $\lambda=\lambda/10$ for the next iteration. Each time SSE $({\beta}_0+{\Delta}) \gt {SSE}({\beta}_0)$ , then $\lambda=10 \lambda$ .

Note: If the SSE decreases on each iteration, then $\lambda arrow 0$ , and you are essentially using the Gauss-Newton method. If SSE does not improve, then $\lambda$ is increased until you are moving in the steepest descent direction.

Marquardt's method is equivalent to performing a series of ridge regressions and is useful when the parameter estimates are highly correlated or the objective function is not well approximated by a quadratic.

Secant Method (DUD)

The multivariate secant method is like the Gauss-Newton method, except that the derivatives are estimated from the history of iterations rather than supplied analytically. The method is also called the method of false position or the Doesn't Use Derivatives (DUD) method (Ralston and Jennrich 1978). If only one parameter is being estimated, the derivative for iteration i+1 can be estimated from the previous two iterations:

${der}_{i+1} = \frac{ \hat{Y}_i - \hat{Y}_{i-1} }{ b_i-b_{i-1} }$

When k parameters are to be estimated, the method uses the last k+1 iterations to estimate the derivatives.

Now that automatic analytic derivatives are available, DUD is not a recommended method but is retained for backward compatibility.

Step-Size Search

The default method of finding the step size k is step halving using SMETHOD=HALVE. If SSE $({\beta}_0+{\Delta}) \gt {SSE}({\beta}_0)$ ,compute SSE $({\beta}_0+0.5{\Delta})$ , SSE $({\beta}_0+0.25{\Delta}), ... ,$ until a smaller SSE is found.

If you specify SMETHOD=GOLDEN, the step size k is determined by a golden section search. The parameter TAU determines the length of the initial interval to be searched, with the interval having length TAU or 2 ×TAU, depending on SSE $({\beta}_0+{\Delta})$ .The RHO parameter specifies how fine the search is to be. The SSE at each endpoint of the interval is evaluated, and a new subinterval is chosen. The size of the interval is reduced until its length is less than RHO. One pass through the data is required each time the interval is reduced. Hence, if RHO is very small relative to TAU, a large amount of time can be spent determining a step size. For more information on the GOLDEN search, refer to Kennedy and Gentle (1980).

If you specify SMETHOD=CUBIC, the NLIN procedure performs a cubic interpolation to estimate the step size. If the estimated step size does not result in a decrease in SSE, step halving is used.

Chapter Contents
Previous
Next
Top