next up previous

STAT 350: Lecture 35

Estimating equations: an introduction via glm

Estimating Equations: refers to equations of the form

displaymath124

which are solved for tex2html_wrap_inline126 to get estimates tex2html_wrap_inline128 . Examples:

  1. The normal equations in linear regression:

    displaymath130

  2. The likelihood equations:

    displaymath132

    where tex2html_wrap_inline134 is the log-likelihood.

  3. The equation which must be solved to do non-linear least squares:

    displaymath136

  4. The iteratively reweighted least squares estimating equation:

    displaymath138

    where, in a generalized linear model the variance tex2html_wrap_inline140 is a known (except possibly for a multiplicative constant) function of tex2html_wrap_inline142 .

Only the first of these equations can usually be solved analytically. In Lecture 34 I showed you an example of an iterative technique of solving such equations.

Theory of Generalized Linear Models

The likelihood function for a Poisson regression model is:

displaymath144

and the log-likelihood is

displaymath146

A typical glm model is

displaymath148

where the tex2html_wrap_inline150 are covariate values for the ith observation (often including an intercept term just as in standard linear regression).

In this case the log-likelihood is

displaymath154

which should be treated as a function of tex2html_wrap_inline156 and maximized.

The derivative of this log-likelihood with respect to tex2html_wrap_inline158 is

displaymath160

If tex2html_wrap_inline156 has p components then setting these p derivatives equal to 0 gives the likelihood equations.

For a Poisson model the variance is given by

displaymath168

so the likelihood equations can be written as

displaymath170

which is the fourth equation above.

These equations are solved iteratively, as in non-linear regression, but with the iteration now involving weighted least squares. The resulting scheme is called iteratively reweighted least squares.

  1. Begin with a guess for the standard deviations tex2html_wrap_inline172 (taking them all equal to 1 is simple).
  2. Do (non-linear) weighted least squares using the guessed weights. Get estimated regression parameters tex2html_wrap_inline174 .
  3. Use these to compute estimated variances tex2html_wrap_inline176 . Go back to do weighted least squares with these weights and get tex2html_wrap_inline178 .
  4. Iterate (repeat over and over) until estimates not really changing.

If the tex2html_wrap_inline180 converge as tex2html_wrap_inline182 to something, say, tex2html_wrap_inline184 then since

displaymath186

we learn that tex2html_wrap_inline184 must be a root of the equation

displaymath190

which is the last of our example estimating equations.

Distribution of Estimators

Distribution Theory is the subject of computing the distribution of statistics, estimators and pivots. Examples in this course are the Multivariate Normal Distribution, the theorems about the chi-squared distribution of quadratic forms, the theorems that F statistics have F distributions when the null hypothesis is true, the theorems that show a t pivot has a t distribution.

Exact Distribution Theory: name applied to exact results such as those in previous example when the errors are assumed to have exactly normal distributions.

Asymptotic or Large Sample Distribution Theory: same sort of conclusions but only approximately true and assuming n is large. Theorems of the form:

displaymath202

Sketch of reasoning in special case

POISSON EXAMPLE: p=1

Assume tex2html_wrap_inline214 has a Poisson distribution with mean tex2html_wrap_inline216 where now tex2html_wrap_inline156 is a scalar.

The estimating equation (the likelihood equation) is

displaymath220

It is now important to distinguish between a value of tex2html_wrap_inline156 which we are trying out in the estimating equation and the true value of tex2html_wrap_inline156 which I will call tex2html_wrap_inline226 . If we happen to try out the true value of tex2html_wrap_inline156 in U then we find

displaymath232

On the other hand if we try out a value of tex2html_wrap_inline156 other than the correct one we find

displaymath236

But tex2html_wrap_inline238 is a sum of independent random variables so by the law of large numbers (law of averages) must be close to its expected value. This means: if we stick in a value of tex2html_wrap_inline156 far from the right value we will not get 0 while if we stick in a value of tex2html_wrap_inline156 close to the right answer we will get something close to 0. This can sometimes be turned in to the assertion:

The glm estimate of tex2html_wrap_inline156 is consistent, that is, it converges to the correct answer as the sample size goes to tex2html_wrap_inline246 .

The next theoretical step is another linearization. If tex2html_wrap_inline184 is the root of the equation, that is, tex2html_wrap_inline250 , then

displaymath252

This is a Taylor's expansion. In our case the derivative tex2html_wrap_inline254 is

displaymath256

so that approximately

displaymath258

The right hand side of this formula has expected value 0, variance

displaymath260

which simplifies to

displaymath262

This means that an approximate standard error of tex2html_wrap_inline184 is

displaymath266

that an estimated approximate standard error is

displaymath268

Finally, since the formula shows that tex2html_wrap_inline270 is a sum of independent terms the central limit theorem suggests that tex2html_wrap_inline184 has an approximate normal distribution and that

displaymath274

is an approximate pivot with approximately a N(0,1) distribution. You should be able to turn this assertion into a 95% (approximate) confidence interval for tex2html_wrap_inline226 .

Scope of these ideas

The ideas in the above calculation can be used in many contexts.

Further exploration of the ideas in this course


next up previous



Richard Lockhart
Wed Apr 2 22:31:40 PST 1997