Reading for Today's Lecture:
Goals of Today's Lecture:
Today's notes
Large Sample Theory
Our goal is to study the behaviour of
and to develop
approximate distribution theory for
.
Here is a summary of the conclusions of the theory:
We now study the approximate behaviour of
by studying
the function U. Notice first that Uis a sum of independent random variables.
Theorem: If
are iid with mean
then
This is called the law of large numbers. The strong law says
Now suppose that
is the true value of
.
Then
Consider as an example the case of
data where
Now we repeat these ideas for a more general case. We study
the random variable
.
You know the inequality
Definition A sequence
of estimators of
is consistent if
converges weakly
(or strongly) to
.
Proto theorem: In regular problems the mle
is consistent.
Now let us study the shape of the log likelihood near the true
value of
under the assumption that
is a
root of the likelihood equations close to
.
We use Taylor
expansion to write, for a 1 dimensional parameter
,
for some
between
and
.
(This form of the remainder in Taylor's theorem is not valid
for multivariate
.) The derivatives of U are each
sums of n terms and so should be both proportional
to n in size. The second derivative is multiplied by the
square of the small number
so should be
negligible compared to the first derivative term.
If we ignore the second derivative term we get
In the normal case
In general,
has mean 0
and approximately a normal distribution. Here is how we check that:
Notice that I have interchanged the order of differentiation
and integration at one point. This step is usually justified
by applying the dominated convergence theorem to the definition
of the derivative. The same tactic can be applied by differentiating
the identity which we just proved
Definition: The Fisher Information is
The idea is that I is a measure of how curved the log
likelihood tends to be at the true value of
.
Big curvature means precise estimates. Our identity above
is
Now we return to our Taylor expansion approximation
We have shown that
is a sum of iid mean 0 random variables.
The central limit theorem thus proves that
Next observe that
Summary
In regular families:
We usually simply say that the mle is consistent and asymptotically
normal with an asymptotic variance which is the inverse of the Fisher
information. This assertion is actually valid for vector valued
where now I is a matrix with ijth entry