For general composite hypotheses optimality theory is not usually successful in producing an optimal test. instead we look for heuristics to guide our choices. The simplest approach is to consider the likelihood ratio
Example 1:
:
test
against
. (Remember UMP
test.) Log likelihood is
Example 2: In the
problem suppose we
make the null
. Then the value of
is simply 0 while
the maximum of the log-likelihood over the alternative
occurs at
. This gives
Example 3: For the
problem testing
against
we must find two estimates of
. The maximum of the likelihood over the alternative
occurs at the global mle
. We find
Maximize
over null hypothesis.
Recall
Notice that if
is large we have
This is a general phenomenon when the null hypothesis being tested is of the form
. Here is the general
theory. Suppose that the vector of
parameters
can
be partitioned into
with
a vector
of
parameters and
a vector of
parameters.
To test
we find two mles of
. First the
global mle
maximizes
the likelihood over
(because typically the probability that
is exactly
is 0).
Now we maximize the likelihood over the null hypothesis, that is
we find
to maximize
Now suppose that the true value of
is
(so that the null hypothesis is true). The score function is a
vector of length
and can be partitioned as
.
The Fisher information matrix can be partitioned as
According to our large sample theory for the mle we have
Theorem: The log-likelihood ratio statistic
Aside:
Theorem: Suppose
with
non-singular and
is a symmetric matrix. If
then
has a
distribution with df
.
Proof: We have
where
and
is standard multivariate normal. So
.
Let
.
Since
condition in the theorem is
is symmetric so
where
is diagonal matrix containing the
eigenvalues of
and
is orthogonal matrix whose columns
are the corresponding orthonormal eigenvectors. So
rewrite
We have established that the general distribution of any
quadratic form
is a linear combination of
variables.
Now go back to the condition
. If
is an eigenvalue
of
and
is a corresponding eigenvector then
but also
. Thus
.
It follows that either
or
. This means
that the weights in the linear combination are all 1 or 0 and that
has a
distribution with degrees of freedom,
,
equal to the number of
which are equal to 1. This is
the same as the sum of the
so
In the application
is
the Fisher information and
where