Today's notes
Reading for Today's Lecture:
Goals of Today's Lecture:
What can we do to find UMVUEs when the CRLB is a strict inequality?
Example: Suppose X has a Binomial(n,p) distribution.
The score function is
A different tactic proceeds as follows. Suppose
T(X) is some unbiased function of X. Then we have
Now a Binomial random variable is just of sum of n iid Bernoulli(p)random variables. If
are iid Bernoulli(p) then
is Binomial(n,p). Could we do better by than
by trying
for some other function T?
Let's consider the case n=2 so that there are 4 possible values for
Y1,Y2. If
h(Y1,Y2) = T(Y1,Y2) - [Y1+Y2]/2 then again
We have already shown that the sum in [] is 0!
This long, algebraically involved, method of proving that
is the UMVUE of p is one special case of
a general tactic.
To get more insight I begin by rewriting
Notice that the large fraction in this formula is the average value
of T over values of y when
is held fixed at x. Notice
that the weights in this average do not depend on p. Notice that
this average is actually
Notice that the conditional probabilities do not depend on p. In a
sequence of Binomial trials if I tell you that 5 of 17 were heads and the
rest tails the actual trial numbers of the 5 Heads are chosen at random
from the 17 possibilities; all of the 17 choose 5 possibilities have the
same chance and this chance does not depend on p.
Notice that in this problem with data
the log likelihood
is
In the binomial situation the conditional distribution of the data
given X is the same for all values of
;
we
say this conditional distribution is free of
.
Definition: A statistic T(X) is sufficient for the model
if the conditional distribution of the
data X given T=t is free of
.
Intuition: Why do the data tell us about
? Because
different values of
give different distributions to X. If two
different values of
correspond to the same joint density or cdf
for X then we cannot, even in principle, distinguish these two values of
by examining X. We extend this notion to the following. If two
values of
give the same conditional distribution of X given Tthen observing T in addition to X does not improve our ability to
distinguish the two values.
Mathematically Precise version of this intuition: If T(X)is a sufficient statistic then we can do the following. If S(X) is any estimate or confidence interval or whatever for a given problem but we only know the value of T then:
You can carry out the first step only if the statistic T is
sufficient; otherwise you need to know the true value of
to
generate X*.
Example 1: IF
are iid Bernoulli(p) then
given
the indexes of the y successes have the
same chance of being any one of the
possible subsets of
.
This chance does not depend on p so
is a sufficient statistic.
example 2: If
are iid
then
the joint distribution of
is multivariate
normal with mean vector whose entries are all
and variance covariance
matrix which can be partitioned as
You can now compute the conditional means and variances of Xi given
and use the fact that the conditional law is multivariate
normal to prove that the conditional distribution of the data given
is multivariate normal with mean vector all of whose
entries are x and variance-covariance matrix given
by
.
Since this does not depend
on
we find that
is sufficient.
WARNING: Whether or not a statistic is sufficient depends on
the density function and on
.
Theorem: Suppose that S(X) is a sufficient statistic
for some model
.
If T is an
estimate of some parameter
then:
Proof: It will be useful to review conditional distributions a bit more carefully at this point. The abstract definition of conditional expectation is this:
Definition: E(Y|X) is any function of X such that
Definition: E(Y|X=x) is a function g(x) such that
Fact: If X,Y has joint density
fX,Y(x,y) and
conditional density f(y|x) then
Proof:
You should simply think of E(Y|X) as being what you get when you average Y holding X fixed. It behaves like an ordinary expected value but where functions of X only are like constants.
Proof of the Rao Blackwell Theorem
Step 1: The definition of sufficiency is that the
conditional distribution of X given S does not depend on
.
This means that E(T(X)|S) does not depend on
.
Step 2: This step hinges on the following identity
(called Adam's law by Jerzy Neyman - he used to say it comes
before all the others)
From this we deduce that
Step 3:
This relies on the following very useful decomposition.
(In regression courses we say that the total sum of squares
is the sum of the regression sum of squares plus the
residual sum of squares.)
We apply this to the Rao Blackwell theorem
to get
Examples:
In the binomial problem
Y1(1-Y2) is an unbiased
estimate of p(1-p). We improve this by computing
Example: If
are iid
then
is sufficient and X1 is an unbiased estimate of
.
Now
which is the UMVUE.