STAT 450
Lecture 26
Rao Blackwell Theorem
Theorem: Suppose that S(X) is a sufficient statistic
for some model
.
If T is an
estimate of some parameter
then:
- 1.
- E(T|S) is a statistic.
- 2.
- E(T|S) has the same bias as T; if T is unbiased so is
E(T|S).
- 3.
-
and the
inequality is strict unless T is a function of S.
- 4.
- The MSE of E(T|S) is no more than that of T.
Fact: If X,Y has joint density
fX,Y(x,y) and
conditional density f(y|x) then
or for discrete X,Y
How to use the theorem:
- 1.
- Guess an unbiased estimate T of the parameter of
interest. (Example: if
is population mean then
X1X2 is unbiased for
.)
- 2.
- Find sufficient statistic S.
- 3.
- Compute
by doing integral or sum. Get
formula with s in it.
- 4.
- Replace s by S to find new estimator.
Examples:
Binomial(n,p) problem:
Y1(1-Y2) is unbiased
estimate of p(1-p). Compute
E(Y1(1-Y2)|X)
Two steps. First compute
E(Y1(1-Y2)|X=x)
Notice that
Y1(1-Y2) is either 1 or 0
so:
This is simply
(which can
be bigger than 1/4 which is the maximum value of
p(1-p)).
Example: If
are iid
then
is sufficient and X1 is an unbiased estimate of
.
Now
which is the UMVUE.
Finding Sufficient statistics
Binomial example:
is
a function of X (and not
of the original data
as well). Normal example:
These are examples of the Factorization Criterion:
Theorem: If the model for data X has density
then the statistic S(X) is sufficient if and only if the density can be
factored as
The theorem is proved by finding a statistic T(x) such that
X is a one to one function of the pair S,T and applying the
change of variables to the joint density of S and T. If the density
factors then you get
from which we see that the conditional density of T given S=s does
not depend on
.
Thus the conditional distribution of (S,T) given
S does not depend on
and finally the conditional distribution
of X given S does not depend on
.
Conversely if S is sufficient
then the conditional density of T given S has no
in it and the joint
density of S,T is
Apply the change of variables formula to get the density of Xto be
where J is the Jacobian. This factors.
Example: If
are iid
then the joint density is
which is evidently a function of
This pair is a sufficient statistic. You can write this pair as a bijective
function of
so that this pair is also
sufficient.
Completeness
In the Binomial(n,p) example I showed that there is only one function of Xwhich is unbiased. The Rao Blackwell theorem shows that a UMVUE, if it exists,
will be a function of any sufficient statistic. If I change T might
I get a different E(T|S)? Generally the answer is yes but for some models
like the binomial with S=X the
answer is no.
Definition: A statistic T is complete for a model
if
for all
implies h(T)=0.
We have already seen that X is complete in the Binomial(n,p) model.
In the
model suppose
Since
has a
distribution we find that
This is the so called Laplace transform of the function
h(x)e-nx2/2. It is a theorem that a Laplace transform is 0
if and only if the function is 0 ( because you can invert the transform).
Hence
.
How to Prove Completeness
There is only one general tactic. Suppose X has
density
If the range of the function
(as
varies over
contains a (hyper-) rectangle
in Rp then the statistic
is complete and sufficient.
Example: In the
model the density
has the form
which is an exponential family with
S1(x) = x2
S2(x) = x
and
It follows that
is a complete sufficient statistic.
Remark: The statistic
is a one to one
function of
so it must be complete
and sufficient, too. Any function of the latter statistic
can be rewritten as a function of the former and vice versa.
The Lehmann-Scheffé Theorem
Theorem: If S is a complete sufficient
statistic for some model and h(S) is an unbiased
estimate of some parameter
then h(S) is the
UMVUE of
.
Proof: Suppose T is another unbiased estimate
of
.
According to the Rao-Blackwell theorem T is improved
by E(T|S) so if h(S) is not UMVUE then there must exist
another function h*(S) which is unbiased and whose variance
is smaller than that of h(S) for some value of
.
But
so, in fact
h*(S) = h(S).
Example: In the
example the random
variable
has a
distribution.
It follows that
Substitute y=x/2 and get
Hence
The UMVUE of
is then
by the Lehmann-Scheffé theorem.
Criticism of Unbiasedness
- 1.
- The UMVUE can be inadmissible for squared error loss
meaning that there is a (biased, of course) estimate whose MSE is
smaller for every parameter value. An example is the UMVUE of
which is
.
The MSE of
is smaller than that of
.
- 2.
- There are examples where unbiased estimation is impossible.
The log odds in a Binomial model is
.
Since
the expectation of any function of the data is a polynomial function
of p and since
is not a polynomial function of p there
is no unbiased estimate of
- 3.
- The UMVUE of
is not the square root of the UMVUE of
.
This method of estimation does not have the parameterization
equivariance that maximum likelihood does.
- 4.
- Unbiasedness is irrelevant (unless you plan to average together
many estimators). The property is an average over possible values of
the estimate in which positive errors are allowed to cancel negative
errors. An exception to this criticism is that if you plan to average
a number of estimators to get a single estimator then it is a problem
if all the estimators have the same bias. In assignment 5 you have
the one way layout example in which the mle of the residual variance
averages together many biased estimates and so is very badly biased.
That assignment shows that the solution is not really to insist on
unbiasedness but to consider an alternative to averaging for
putting the individual estimates together.
Minimal Sufficiency
In any model the statistic
is sufficient. In any iid
model the vector of order statistics
is
sufficient. In the
model then we have three possible sufficient
statistics:
- 1.
-
.
- 2.
-
.
- 3.
-
.
Notice that I can calculate S3 from the values of S1 or S2but not vice versa and that I can calculate S2 from S1 but not vice
versa. It turns out that
is a minimal sufficient
statistic meaning that it is a function of any other sufficient statistic.
(You can't collapse the data set any more without losing information about
.)
To recognize minimal sufficient statistics you look at the likelihood
function:
Fact: If you fix some particular
then
the log likelihood ratio function
is minimal sufficient. WARNING: the function is the statistic.
The
subtraction of
gets rid of those irrelevant constants
in the log-likelihood. For instance in the
example we
have
This depends on
which is not needed for the sufficient
statistic. Take
and get
This function of
is minimal sufficient. Notice that from
you can compute this minimal sufficient statistic and vice
versa. Thus
is also minimal sufficient.
FACT: A complete sufficient statistic is also minimal
sufficient.
Richard Lockhart
1999-11-15