next up previous

STAT 350: Lecture 33

Theory of Generalized Linear Models

If Y has a Poisson distribution with parameter tex2html_wrap_inline87 then

displaymath89

for k a non-negative integer. We can use the method of maximum likelihood to estimate k if we have a sample tex2html_wrap_inline95 of independent Poisson random variables all with mean tex2html_wrap_inline87 . If we observe tex2html_wrap_inline99 , tex2html_wrap_inline101 and so on then the likelihood function is

displaymath103

This function of tex2html_wrap_inline87 can be maximized by maximizing its logarithm, the log likelihood function. We set the derivative of the log likelihood with respect to tex2html_wrap_inline87 equal to 0, getting

displaymath109

This is the likelihood equation and its solution tex2html_wrap_inline111 is the maximum likelihood estimate of tex2html_wrap_inline87 .

In a regression problem all the tex2html_wrap_inline115 will have different means tex2html_wrap_inline117 . Our log-likelihood is now

displaymath119

If we treat all n of the tex2html_wrap_inline117 we can maximize the log likelihood by setting each of the n partial derivatives with respect to tex2html_wrap_inline127 for k from 1 to n equal to 0. The kth of these n equations is just

displaymath137

This leads to tex2html_wrap_inline139 . In glm jargon this model is the saturated model.

A more useful model is one in which there are fewer parameters but more than 1. A typical glm model is

displaymath141

where the tex2html_wrap_inline143 are covariate values for the ith observation (often including an intercept term just as in standard linear regression).

In this case the log-likelihood is

displaymath147

which should be treated as a function of tex2html_wrap_inline149 and maximized.

The derivative of this log-likelihood with respect to tex2html_wrap_inline151 is

displaymath153

If tex2html_wrap_inline149 has p components then setting these p derivatives equal to 0 gives the likelihood equations.

It is no longer possible to solve the likelihood equations analytically. We have, instead, to settle for numerical techniques. One common technique is called iteratively re-weighted least squares. For a Poisson variable with mean tex2html_wrap_inline117 the variance is tex2html_wrap_inline163 . Ignore for a moment the fact that if we knew tex2html_wrap_inline165 we would know tex2html_wrap_inline117 and consider fitting our model by least squares with the tex2html_wrap_inline169 known. We would minimize (see our discussion of weighted least squares)

displaymath171

by taking the derivative with respect to tex2html_wrap_inline151 and (again ignoring the fact that tex2html_wrap_inline169 depends on tex2html_wrap_inline151 we would get

displaymath179

But the derivative of tex2html_wrap_inline117 with respect to tex2html_wrap_inline151 is tex2html_wrap_inline185 and replacing tex2html_wrap_inline169 by tex2html_wrap_inline117 we get the equation

displaymath191

exactly as before. This motivates the following estimation scheme.

  1. Begin with a guess for the standard deviations tex2html_wrap_inline165 (taking them all equal to 1 is simple).
  2. Do (non-linear) weighted least squares using the guessed weights. Get estimated regression parameters tex2html_wrap_inline195 .
  3. Use these to compute estimated variances tex2html_wrap_inline197 . Go back to do weighted least squares with these weights.
  4. Iterate (repeat over and over) until estimates not really changing.

Non-linear least squares

In a regression model of the form

displaymath199

we call the model linear if tex2html_wrap_inline201 for some covariates tex2html_wrap_inline143 and a parameter tex2html_wrap_inline149 . Often, however, we have non-linear models such as tex2html_wrap_inline207 or in general tex2html_wrap_inline209 for some function f which need not be linear in tex2html_wrap_inline149 . If the errors are homo-scedastic normal errors then the maximum likelihood estimates of the tex2html_wrap_inline215 are least squares estimates, minimizing tex2html_wrap_inline217 . The value tex2html_wrap_inline195 then solves

displaymath221

This equation can usually not be solved analytically. In general the process of solving it numerically is iterative. One approach is:

  1. Begin with an initial guess (no general method available to do this step) for the parameter estimates, say tex2html_wrap_inline223 .
  2. Linearize the function tex2html_wrap_inline225 around tex2html_wrap_inline223 by doing a Taylor expansion. This involves writing

    displaymath229

    where tex2html_wrap_inline143 is the vector of partial derivatives of tex2html_wrap_inline117 with respect to the entries in tex2html_wrap_inline149 .

  3. Replace tex2html_wrap_inline117 in the sum of squares by this approximation and get that you are trying to minimize

    displaymath239

    with respect to tex2html_wrap_inline149 .

  4. This is really now an ordinary least squares problem in which we are regressing tex2html_wrap_inline243 on tex2html_wrap_inline245 . Let tex2html_wrap_inline247 be the tex2html_wrap_inline249 matrix with ith row tex2html_wrap_inline253 where the derivative is evaluated at tex2html_wrap_inline223 . Then we are to minimize

    displaymath257

    where tex2html_wrap_inline259 . The solution is tex2html_wrap_inline261 . We compute this tex2html_wrap_inline263 .

  5. Update tex2html_wrap_inline223 to tex2html_wrap_inline267 .
  6. Recompute the entries in X with the new value tex2html_wrap_inline271 , get a new tex2html_wrap_inline273 then a tex2html_wrap_inline275 and so on.
  7. Continue until the estimates change little or the sum of squares changes little.

Example

In thermoluminescence dating ( see Lecture 1) a common model is

displaymath277

Thus in this case

displaymath279

The derivatives of tex2html_wrap_inline117 with respect to the entries of tex2html_wrap_inline149 are

displaymath285

displaymath287

and

displaymath289

Software and computing details

In SAS the procedure proc nlin will do non-linear least squares. You need to specify quite a few ingredients. The model statement must specify the exact functional form of the function tex2html_wrap_inline225 . You have a statement parms which is needed to specify the initial guess for tex2html_wrap_inline149 . You usually will have a bunch of der statements to specify derivatives of the predicted values with respect to each parameter and, if you use some algorithms, you will even specify some second derivatives. (You get a choice of 5 different iterative algorithms. The one described above is called Gauss-Newton.)

In Splus the function nls can be used to to the same thing.


next up previous



Richard Lockhart
Mon Mar 24 11:01:36 PST 1997