Hypothesis testing: a statistical problem where you must choose,
on the basis of data
, between two alternatives. We formalize this
as the problem of choosing between two hypotheses:
or
where
and
are a partition of the model
. That is
and
.
A rule for making the required choice can be described in two ways:
For technical reasons which will come up soon I prefer to use the
second description. However, each
corresponds to a
unique rejection region
.
Neyman Pearson approach treats two hypotheses asymmetrically.
Hypothesis
referred to as the null hypothesis
(traditionally the hypothesis that some treatment has no effect).
Definition: The power function of a test
(or the corresponding critical region
) is
Interested in optimality theory, that is,
the problem of finding the best
. A good
will
evidently have
small for
and large for
. There is generally a
trade off which can be made in many ways, however.
Finding a best test is easiest when the hypotheses are very precise.
Definition: A hypothesis
is simple
if
contains only a single value
.
The simple versus simple testing problem arises when we test
against
so that
has only two points in it. This problem is of importance as
a technical tool, not because it is a realistic situation.
Suppose that the model specifies that if
then
the density of
is
and if
then
the density of
is
. How should we choose
?
To answer the question we begin by studying the problem of
minimizing the total error probability.
Type I error: the error made when
but we choose
, that is,
.
Type II error: when
but we choose
.
The level of a simple versus simple test is
Other error probability denoted
is
Minimize
, the total error
probability given by
![]() |
Problem: choose, for each
, either the value
0 or the value 1, in such a way as to minimize the integral.
But for each
the quantity
Theorem: For each fixed
the
quantity
is minimized by any
which has
The Neyman and Pearson approach is then to minimize
subject to the constraint
.
Usually this is really equivalent to the constraint
(because if you use
you could make
larger and keep
but
make
smaller. For discrete models, however, this
may not be possible.
Example: Suppose
is Binomial
and
either
or
.
If
is any critical region
(so
is a subset of
) then
Possible rejection regions for
:
| Region | ||
|
|
0 | 1 |
|
|
0.03125 | |
|
|
0.03125 |
So
minimizes
subject to
.
Raise
slightly to 0.0625: possible
rejection regions are
,
,
and
.
The first three have the same
and
as before while
has
an
. Thus
is optimal!
Problem: if all trials are failures ``optimal''
chooses
rather than
. But
makes 5 failures
much more likely does
.
Problem: discreteness. Here's how we get
around the problem. First we expand the set of possible values of
to include numbers between 0 and 1. Values of
between
0 and 1 represent the chance that we choose
given that we observe
; the idea is that we actually toss a (biased) coin to decide!
This tactic will show us the kinds of rejection regions which are
sensible. In practice we then restrict our attention to levels
for which the best
is always either 0 or 1. In the
binomial example we will insist that the value of
be either 0 or
or
or ...
Smaller example: 4 possible values of
and
possible
rejection regions. Here is a table of the levels for each possible rejection region
:
| 0 | |
| {3}, {0} | 1/8 |
| {0,3} | 2/8 |
| {1}, {2} | 3/8 |
| {0,1}, {0,2}, {1,3}, {2,3} | 4/8 |
| {0,1,3}, {0,2,3} | 5/8 |
| {1,2} | 6/8 |
| {0,1,3}, {0,2,3} | 7/8 |
| {0,1,2,3} | 1 |
Best level
test has rejection region
,
.
Best level
test using randomization rejects when
and, when
tosses a coin with
,
then rejects if you get H.
Level is
; probability of
Type II error is
.
Definition: A hypothesis test is a function
whose values are always in
. If we observe
then we
choose
with conditional probability
.
In this case we have
Note that a test using a rejection region
is equivalent to
The Neyman Pearson Lemma: In testing
against
the probability
of a type II error is minimized, subject to
by the test function:
![]() |
|||
![]() |
|||
Example: Binomial
with
and
:
ratio
is
Suppose we have
.
must be one
of the possible values of
. If we try
then
Since
NOTE: No-one ever uses this procedure. Instead the value of
used in discrete problems is chosen to be a possible value of the rejection
probability when
(or
). When the sample size is large
you can come very close to any desired
with a non-randomized test.
If
then we can either take
to be 243/32
and
or
and
. However, our
definition of
in the theorem makes
and
.
When the theorem is used for continuous distributions it can be the case
that the cdf of
has a flat spot where it is equal to
. This is the point of the word ``largest'' in the theorem.
Example: If
are iid
and
we have
and
then

![\begin{multline*}
P_0(\sum X_i > [\log(\lambda) +n\mu_1^2/2]/\mu_1)
\\
=
\\
1-\Phi\left(\frac{\log(\lambda) +n\mu_1^2/2}{n^{1/2}\mu_1}\right)
\end{multline*}](img142.gif)
The rejection region looks complicated: reject if a complicated
statistic is larger than
which has a complicated formula.
But in calculating
we re-expressed the rejection
region in terms of
Proof of Neyman Pearson lemma: Given a test
with level strictly less than
we can define
the test
Suppose you want to minimize
subject to
.
Consider first the function
Notice that to find
you set the usual partial
derivatives equal to 0; then to find the special
you
add in the condition
.
For each
we have seen that
minimizes
where
.
As
increases the level of
decreases from 1
when
to 0 when
. There is thus a
value
where for
the level is less than
while for
the level is at least
.
Temporarily let
. If
define
. If
define
Now
has level
and according to the theorem above
minimizes
. Suppose
is some other
test with level
. Then
Example application of NP: Binomial
to test
versus
for a
the NP test is of the form
Application of the NP lemma: In the
model
consider
and
or
. The UMP level
test of
against
is
Proof: For either choice of
this test has level
because
for
we have
This phenomenon is somewhat general. What happened was this. For
any
the likelihood ratio
is an increasing
function of
. The rejection region of the NP test is thus
always a region of the form
. The value of the constant
is determined by the requirement that the test have level
and this depends on
not on
.
Definition: The family
has monotone likelihood ratio with respect to a statistic
if for each
the likelihood ratio
is a monotone increasing function of
.
Theorem: For a monotone likelihood ratio family the
Uniformly Most Powerful level
test of
(or of
) against the alternative
is
Typical family where this works: one parameter exponential family. Usually there is no UMP test.
Example: test
against two sided alternative
. There is no UMP level
test.
If there were its power at
would have to be
as high as that of the one sided level
test and so its rejection
region would have to be the same as that test, rejecting for large
positive values of
. But it also has to have power as
good as the one sided test for the alternative
and
so would have to reject for large negative values of
.
This would make its level too large.
Favourite test: usual 2 sided test rejects for large values
of
.
Test maximizes power subject to two constraints: first,
level
; second power is minimized at
.
Second condition means
power on alternative is larger than on the null.