Large Sample Theory
Study approximate behaviour of
by studying
the function
.
Notice
is sum of independent random variables.
Theorem: If
are iid with mean
then
Called law of large numbers. Strong law
Now suppose
is true value of
. Then
![]() |
||
![]() |
Example:
data:
Consider
: derivative of
is likely to be positive so that
increases
as
increases.
For
: derivative
is probably negative and so
tends to be decreasing for
.
Hence:
is likely to be maximized close to
.
Repeat ideas for more general case. Study rv
Generalization: Jensen's inequality:
for
a convex function (
roughly) then
Inequality above has
. Use
:
convex because
. We get
![]() |
![]() |
|
![]() |
||
Let
be this expected value.
Then
for each
we find
![\begin{multline*}
\frac{\ell(\theta) - \ell(\theta_0)}{n}
\\
=
\frac{\sum \log[f(X_i,\theta)/f(X_i,\theta_0)] }{n}
\\
\to \mu(\theta)
\end{multline*}](img42.gif)
Idea can often be stretched to prove that the mle is consistent.
Definition A sequence
of estimators of
is consistent if
converges weakly
(or strongly) to
.
Proto theorem: In regular problems the mle
is consistent.
More precise statements of possible conclusions. Use notation
Suppose:
is global maximizer of
.
maximizes
over
.
Point: conditions get weaker as conclusions get weaker. Many possible conditions in literature. See book by Zacks for some precise conditions.
Study shape of log likelihood near the true
value of
.
Assume
is a root of the likelihood equations close to
.
Taylor expansion (1 dimensional parameter
):
| 0 | ||
WARNING: This form of the remainder in Taylor's theorem is not valid
for multivariate
.
Derivatives of
are sums of
terms.
So each derivative should be proportional
to
in size.
Second derivative is multiplied by the
square of the small number
so should be
negligible compared to the first derivative term.
Ignoring second derivative term get
Normal case:
Derivative is
Next derivative
is 0.
Notice: both
and
are
sums of iid random variables.
Let
In general,
has mean 0
and approximately a normal distribution.
Here is how we check that:
![]() |
||
![]() |
||
![]() |
||
![]() |
||
![]() |
||
Notice: interchanged order of differentiation and integration at one point.
This step is usually justified by applying the dominated convergence theorem to the definition of the derivative.
Differentiate identity just proved:
![\begin{multline*}
-\int\frac{\partial^2\log(f)}{\partial\theta^2} f(x,\theta) dx...
...artial\log f}{\partial\theta}(x,\theta) \right]^2
f(x,\theta) dx
\end{multline*}](img86.gif)
Definition: The Fisher Information is
The idea is that
is a measure of how curved the log
likelihood tends to be at the true value of
.
Big curvature means precise estimates. Our identity above
is
Now we return to our Taylor expansion approximation
We have shown that
is a sum of iid mean 0 random variables.
The central limit theorem thus proves that
Next observe that
Summary
In regular families: assuming
is a consistent
root of
.
![]() |
||
![]() |
Note: If the square roots are replaced by matrix square roots we
can let
be vector valued and get
as the limit law.
Why all these different forms? Use limit
laws to test hypotheses and compute confidence intervals.
Test
using one of the 4 quantities as
test statistic. Find confidence intervals using quantities
as pivots. E.g.: second and fourth limits lead to
confidence intervals
Usual summary: mle is consistent and asymptotically normal with an asymptotic variance which is the inverse of the Fisher information.
Method of Moments
Basic strategy: set sample moments equal to population moments and solve for the parameters.
Definition: The
sample moment (about the origin)
is
( Central moments are
If we have
parameters we can estimate the parameters
by solving the system of
equations:
Gamma Example
The Gamma(
) density is
The method of moments equations are much easier to solve than the likelihood equations which involve the function
Score function has components
Estimating Equations
Same large sample ideas arise whenever estimates derived by solving some equation.
Example: large sample theory for Generalized Linear Models.
Suppose
is number of cancer cases in some group of people characterized by values
of some covariates.
Think of
as containing variables like age, or a dummy for sex or
average income or
.
Possible parametric regression model:
has a Poisson distribution with mean
where the mean
depends somehow on
.
Typically assume
;
is link function.
Often
and
is a matrix product:
row vector,
column vector.
``Linear regression model with Poisson errors''.
Special case
where
is a scalar.
The log likelihood is simply
Other estimating equations are possible, popular.
If
is any set of
deterministic weights (possibly depending on
)
then could define
Idea widely used:
Example: Generalized Estimating Equations, Zeger and Liang.
Abbreviation: GEE.
Called by econometricians Generalized Method of Moments.
An estimating equation is unbiased if
Theorem: Suppose
is a consistent root of the
unbiased estimating equation
![]() |