Today's notes
Reading for Today's Lecture:
Goals of Today's Lecture:
Today's notes
So far:
The Fisher information matrix is
Theorem: In iid sampling
D: Uniform
We have
iid with density
.
We find
This family has the feature that the support of the density,
namely
depends on
.
In such families
it is common for the standard mle theory to fail.
Confidence Intervals:
We can base confidence intervals on one of several forms. For this
section I will assume that
is a scalar (one dimensional)
parameter and use
to denote a derivative with respect to
the parameter. There are 3 standard versions of the normal approximation:
Each of these quantities may be used to derive confidence intervals for
by finding the collection of all
for which the
quantity is smaller than some critical point.
The second and third quantities are of the form
The first quantity above can also be used to derive a confidence interval
but you must do more work, usually, to solve the inequality
Here are some examples:
Exponential distribution: With
iid
with density
we have
We could also use the fact that
Cauchy example: In this example I use the observed
information (namely
):
In the Cauchy example we found
All these normal approximations can be used to give tests of
either the one sided hypotheses
or
against
or the two sided
hypotheses
against
.
All you do is stick in
for
and then get P-values
from the normal distribution.
In the exponential example for instance you use either
In the Cauchy case you could use
Basic strategy: set sample moments equal to population moments and solve for the parameters.
Definition: The
sample moment (about the origin)
is
Central moments are
If we have p parameters we can estimate the parameters
by solving the system of p
equations:
Gamma Example
The Gamma(
)
density is
The equations are much easier to solve than the likelihood equations which
involve the function
Why bother doing the Newton Raphson steps? Why not just use the method of moments estimates? The answer is that the method of moments estimates are not usually as close to the right answer as the mles.
Rough principle: A good estimate
of
is usually
close to
if
is the true value of
.
Closer
estimates, more often, are better estimates.
This principle must be quantified if we are to ``prove'' that the mle is a good estimate. In the Neyman Pearson spirit we measure average closeness.
Definition: The Mean Squared Error (MSE) of an estimator
is the function
Standard identity:
Primitive example: I take a coin from my pocket and toss it
6 times. I get HTHTTT. The MLE of the probability of heads is
An alternative estimate is
.
That is,
ignores the data and guesses the coin is fair. The
MSEs of these two estimators are
Now suppose I did the same experiment with a thumbtack. The tack
can land point up (U) or tipped over (O). If I get
UOUOOO how should I estimate p the probability of U? The
mathematics is identical to the above but it seems clear that
there is less reason to think
is better than
since there is less reason to believe
than
with a coin.
The problem above illustrates a general phenomenon. An estimator can
be good for some values of
and bad for others. When comparing
and
,
two estimators of
we will say
that
is better than
if it has uniformly
smaller MSE:
The definition raises the question of the existence of a best
estimate - one which is better than every other estimator. There is
no such estimate. Suppose
were such a best estimate. Fix
a
in
and let
.
Then the
MSE of
is 0 when
.
Since
is
better than
we must have