The problem above illustrates a general phenomenon. An estimator can
be good for some values of
and bad for others. When comparing
and
, two estimators of
we will say
that
is better than
if it has uniformly
smaller MSE:
The definition raises the question of the existence of a best
estimate - one which is better than every other estimator. There is
no such estimate. Suppose
were such a best estimate. Fix
a
in
and let
. Then the
MSE of
is 0 when
. Since
is
better than
we must have
Principle of Unbiasedness: A good estimate is unbiased, that is,
WARNING: In my view the Principle of Unbiasedness is a load of hog wash.
For an unbiased estimate the MSE is just the variance.
Definition:
An estimator
of a parameter
is
Uniformly Minimum Variance Unbiased (UMVU) if, whenever
is an unbiased estimate of
we have
The point of having
is to study problems like estimating
when you have two parameters like
and
for example.
If
we can derive some information from the
identity
![]() |
||
![]() |
||
![]() |
||
Summary of Implications
What can we do to find UMVUEs when the CRLB is a strict inequality?
Example: Suppose
has a Binomial(
) distribution.
The score function is
Different tactic: Suppose
is some unbiased function of
. Then we have
A Binomial random variable is a sum of
iid Bernoulli
rvs. If
iid Bernoulli(
) then
is Binomial(
). Could we do better by than
by trying
for some other function
?
Try
. There are 4 possible values for
. If
then
![]() |
||
![]() |
We have already shown that the sum in
is 0!
This long, algebraically involved, method of proving that
is the UMVUE of
is one special case of
a general tactic.
To get more insight rewrite
![]() |
||
![]() |
||
![]() |
Notice large fraction in formula is average value
of
over values of
when
is held fixed at
. Notice
that the weights in this average do not depend on
. Notice that
this average is actually
Notice: with data
log likelihood
is
In the binomial situation the conditional distribution of the data
given
is the same for all values of
; we
say this conditional distribution is free of
.
Defn: Statistic
is sufficient for the model
if conditional distribution of
data
given
is free of
.
Intuition: Data tell us about
if
different values of
give different distributions to
. If two
different values of
correspond to same density or cdf
for
we cannot distinguish these two values of
by examining
. Extension of this notion: if two
values of
give same conditional distribution of
given
then observing
in addition to
doesn't improve our ability to
distinguish the two values.
Mathematically Precise version of this intuition: Suppose
is sufficient statistic and
is any
estimate or confidence interval or ... If you only
know value of
then:
You can carry out the first step only if the statistic
is
sufficient; otherwise you need to know the true value of
to
generate
.
Example 1:
iid Bernoulli(
).
Given
the indexes of the
successes have the
same chance of being any one of the
possible subsets of
. Chance does not depend on
so
is sufficient statistic.
Example 2:
iid
.
Joint distribution of
is MVN.
All entries of mean vector are
. Variance covariance
matrix partitioned as
Compute conditional means and variances of
given
; use fact that conditional law is MVN.
Conclude conditional law of data given
is MVN. Mean vector has all
entries
. Variance-covariance matrix is
. No dependence
on
so
is sufficient.
WARNING: Whether or not statistic is sufficient depends on
density function and on
.
Theorem: [Rao-Blackwell] Suppose
is a sufficient statistic
for model
. If
is an
estimate of
then:
Proof: Review conditional distributions: abstract definition of conditional expectation is:
Defn:
is any function of
such that
Fact: If
has joint density
and
conditional density
then
Proof:
![]() |
||
![]() |
||
![]() |
||
Think of
as
average
holding
fixed. Behaves like ordinary
expected value but functions of
only are like constants:
Example:
iid Bernoulli(
). Then
is Binomial(
). Summary of conclusions:
This proof that
is UMVUE of
is special case of
general tactic.
Proof of the Rao Blackwell Theorem
Step 1: The definition of sufficiency is that the
conditional distribution of
given
does not depend on
. This means that
does not depend on
.
Step 2: This step hinges on the following identity (called Adam's law by Jerzy Neyman - he used to say it comes before all the others)
From this we deduce that
Step 3: This relies on the following very useful decomposition. (In regression courses we say that the total sum of squares is the sum of the regression sum of squares plus the residual sum of squares.)
Fourth term simplifies
Apply to Rao Blackwell theorem to get
Examples:
In the binomial problem
is an unbiased
estimate of
. We improve this by computing
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
![]() |
|||
Example: If
are iid
then
is sufficient and
is an unbiased estimate of
.
Now
Binomial
: log likelihood
(part depending on
) is function of
alone, not
of
as well.
Normal example:
is, ignoring terms not containing
,
Examples of the Factorization Criterion:
Theorem: If the model for data
has density
then the statistic
is sufficient if and only if the density can be
factored as
Proof: Find statistic
such that
is a one to one function of the pair
. Apply
change of variables to the joint density of
and
. If the density
factors then
Conversely if
is sufficient
then the
has no
in it so joint
density of
is
Example: If
are iid
then the joint density is

Example: If
are iid Bernoulli
then
In any model
is sufficient. (Apply the factorization
criterion.) In any iid
model the vector
of order statistics
is
sufficient. (Apply the factorization criterion.)
In
model we have 3 sufficient
statistics:
Notice that I can calculate
from the values of
or
but not vice versa and that I can calculate
from
but not vice
versa. It turns out that
is a minimal sufficient
statistic meaning that it is a function of any other sufficient statistic.
(You can't collapse the data set any more without losing information about
.)
Recognize minimal sufficient statistics from
:
Fact: If you fix some particular
then
the log likelihood ratio function
Subtraction of
gets rid of irrelevant constants
in
. In
example:
In Binomial
example only one function of
is unbiased. Rao Blackwell shows UMVUE, if it exists,
will be a function of any sufficient statistic. Can there be more than one such
function? Yes in general but no for some models like the binomial.
Definition: A statistic
is complete for a model
if
We have already seen that
is complete in the Binomial
model.
In the
model suppose
There is only one general tactic. Suppose
has
density
You prove the sufficiency by the factorization criterion and
the completeness using the properties of Laplace transforms
and the fact that the joint density of
Example:
model density
has form
Remark: The statistic
is a one to one
function of
so it must be complete
and sufficient, too. Any function of the latter statistic
can be rewritten as a function of the former and vice versa.
FACT: A complete sufficient statistic is also minimal sufficient.
Theorem: If
is a complete sufficient
statistic for some model and
is an unbiased
estimate of some parameter
then
is the
UMVUE of
.
Proof: Suppose
is another unbiased estimate
of
. According to Rao-Blackwell,
is improved
by
so if
is not UMVUE then there must exist
another function
which is unbiased and whose variance
is smaller than that of
for some value of
. But
Example: In the
example the random
variable
has a
distribution.
It follows that
Binomial
log odds is
. Since
the expectation of any function of the data is a polynomial function
of
and since
is not a polynomial function of
there
is no unbiased estimate of