STAT 450
Lecture 28
Goals for today:
- Introduce concepts of Hypothesis Testing
- Define power, level, most powerful.
- State and prove the Neyman Pearson Lemma
Last time:
Exponential Families, Lehmann-Scheffé Theorem
If
are iid with density
then the joint density of the data is
If the range of the function
(as
varies over
contains a (hyper-) rectangle
in Rp then the statistic
is complete and sufficient.
The Lehmann-Scheffé Theorem
Theorem: If S is a complete sufficient
statistic for some model and h(S) is an unbiased
estimate of some parameter
then h(S) is the
UMVUE of
.
Example: In the
example
is complete and sufficient so the UMVUEs of
are
,
s2 and s4/2.
Criticism of Unbiasedness
- 1.
- The UMVUE can be inadmissible for squared error loss
meaning that there is a (biased, of course) estimate whose MSE is
smaller for every parameter value. An example is the UMVUE of
which is
.
The MSE of
is smaller than that of
.
- 2.
- There are examples where unbiased estimation is impossible.
The log odds in a Binomial model is
.
Since
the expectation of any function of the data is a polynomial function
of p and since
is not a polynomial function of p there
is no unbiased estimate of
- 3.
- The UMVUE of
is not the square root of the UMVUE of
.
This method of estimation does not have the parameterization
equivariance that maximum likelihood does.
- 4.
- Unbiasedness is irrelevant (unless you plan to average together
many estimators). The property is an average over possible values of
the estimate in which positive errors are allowed to cancel negative
errors. An exception to this criticism is that if you plan to average
a number of estimators to get a single estimator then it is a problem
if all the estimators have the same bias. In assignment 5 you have
the one way layout example in which the mle of the residual variance
averages together many biased estimates and so is very badly biased.
That assignment shows that the solution is not really to insist on
unbiasedness but to consider an alternative to averaging for
putting the individual estimates together.
Minimal Sufficiency
In any model the statistic
is sufficient. In any iid
model the vector of order statistics
is
sufficient. In the
model then we have three possible sufficient
statistics:
- 1.
-
.
- 2.
-
.
- 3.
-
.
Notice that I can calculate S3 from the values of S1 or S2but not vice versa and that I can calculate S2 from S1 but not vice
versa. It turns out that
is a minimal sufficient
statistic meaning that it is a function of any other sufficient statistic.
(You can't collapse the data set any more without losing information about
.)
To recognize minimal sufficient statistics you look at the likelihood
function:
Fact: If you fix some particular
then
the log likelihood ratio function
is minimal sufficient. WARNING: the function is the statistic.
The
subtraction of
gets rid of those irrelevant constants
in the log-likelihood. For instance in the
example we
have
This depends on
which is not needed for the sufficient
statistic. Take
and get
This function of
is minimal sufficient. Notice that from
you can compute this minimal sufficient statistic and vice
versa. Thus
is also minimal sufficient.
FACT: A complete sufficient statistic is also minimal
sufficient.
- 1.
- The UMVUE can be inadmissible for squared error loss,
meaning that there is a (biased, of course) estimate whose MSE is
smaller for every parameter value. An example is the UMVUE of
which is
.
The MSE of
is smaller than that of
.
Another example is provided by
estimation of
in the
problem; see the
homework.
- 2.
- There are examples where unbiased estimation is impossible.
The log odds in a Binomial model is
.
Since
the expectation of any function of the data is a polynomial function
of p and since
is not a polynomial function of p there
is no unbiased estimate of
- 3.
- The UMVUE of
is not the square root of the UMVUE of
.
This method of estimation does not have the parameterization
equivariance that maximum likelihood does.
- 4.
- Unbiasedness is irrelevant (unless you plan to average together
many estimators). The property is an average over possible values of
the estimate in which positive errors are allowed to cancel negative
errors. An exception to this criticism is that if you plan to average
a number of estimators to get a single estimator then it is a problem
if all the estimators have the same bias. In assignment 5 you have
the one way layout example in which the mle of the residual variance
averages together many biased estimates and so is very badly biased.
That assignment shows that the solution is not really to insist on
unbiasedness but to consider an alternative to averaging for
putting the individual estimates together.
Minimal Sufficiency
In any model the statistic
is sufficient. In any iid
model the vector of order statistics
is
sufficient. In the
model then we have three possible sufficient
statistics:
- 1.
-
.
- 2.
-
.
- 3.
-
.
Notice that I can calculate S3 from the values of S1 or S2but not vice versa and that I can calculate S2 from S1 but not vice
versa. It turns out that
is a minimal sufficient
statistic meaning that it is a function of any other sufficient statistic.
(You can't collapse the data set any more without losing information about
.)
To recognize minimal sufficient statistics you look at the likelihood
function:
Fact: If you fix some particular
then
the log likelihood ratio function
is minimal sufficient. WARNING: the function is the statistic.
The
subtraction of
gets rid of those irrelevant constants
in the log-likelihood. For instance in the
example we
have
This depends on
which is not needed for the sufficient
statistic. Take
and get
This function of
is minimal sufficient. Notice that from
you can compute this minimal sufficient statistic and vice
versa. Thus
is also minimal sufficient.
FACT: A complete sufficient statistic is also minimal
sufficient.
Hypothesis Testing
Hypothesis testing is a statistical problem where you must choose,
on the basis of data X, between two alternatives. We formalize this
as the problem of choosing between two hypotheses:
or
where
and
are a partition of the model
.
That is
and
.
A rule for making the required choice can be described
in two ways:
- 1.
- In terms of the set
called the rejection or critical region of the test.
- 2.
- In terms of a function
which is equal to 1 for
those x for which we choose
and 0 for those x for
which we choose
.
For technical reasons which will come up soon I prefer to use the
second description. However, each
corresponds to a
unique rejection region
.
The Neyman Pearson approach to hypothesis testing which we
consider first treats the two hypotheses asymmetrically. The
hypothesis Ho is referred to as the null hypothesis
(because traditionally it has been the hypothesis that some
treatment has no effect).
Definition: The power function of a test
(or the corresponding critical region
)
is
We are interested here in optimality theory, that is,
the problem of finding the best
.
A good
will
evidently have
small for
and large for
.
There is generally a
trade off which can be made in many ways, however.
Simple versus Simple testing
Finding a best test is easiest when the hypotheses are very
precise.
Definition: A hypothesis Hi is simple
if
contains only a single value
.
The simple versus simple testing problem arises when we test
against
so that
has only two points in it. This problem is of importance as
a technical tool, not because it is a realistic situation.
Suppose that the model specifies that if
then
the density of X is f0(x) and if
then
the density of X is f1(x). How should we choose
?
To answer the question we begin by studying the problem of
minimizing the total error probability.
We define a Type I error as the error made when
but we choose H1, that is,
.
The other kind of error, when
but we choose
H0 is called a Type II error. We define the level
of a simple versus simple test to be
or
The other error probability is denoted
and defined as
Suppose we want to minimize
,
the total error
probability. We want to minimize
The problem is to choose, for each x, either the value
0 or the value 1, in such a way as to minimize the integral.
But for each x the quantity
can be chosen either to be f0(x) or f1(X). To make
it small we take
if
f1(x)> f0(x) and
if
f1(x) < f0(x). It makes no
difference what we do for those x for which
f1(x)=f0(x). Notice that we can divide both sides
of these inequalities to rephrase the condition in terms
of the likelihood ration
f1(x)/f0(x).
Theorem: For each fixed
the
quantity
is minimized by any
which has
Richard Lockhart
1999-11-20