web

STAT 801: Mathematical Statistics

Likelihood Ratio tests

For general composite hypotheses optimality theory is not usually successful in producing an optimal test. instead we look for heuristics to guide our choices. The simplest approach is to consider the likelihood ratio

$\displaystyle \frac{f_{\theta_1}(X)}{f_{\theta_0}(X)}$

and choose values of $\theta_1 \in \Theta_1$ and $\theta_0 \in \Theta_0$ which are reasonable estimates of $\theta$ assuming respectively the alternative or null hypothesis is true. The simplest method is to make each $\theta_i$ a maximum likelihood estimate, but maximized only over $\Theta_i$ .

Example 1: $N(\mu,1)$ : test $\mu \le 0$ against $\mu>0$ . (Remember UMP test.) Log likelihood is

$\displaystyle -n(\bar{X}-\mu)^2/2$

If $\bar{X} >0$ then global maximum in $\Theta_1$ at $\bar{X}$ . If $\bar{X} \le 0$ global maximum in $\Theta_1$ at 0. Thus $\hat\mu_1$ which maximizes $\ell(\mu)$ subject to $\mu>0$ is $\bar{X}$ if $\bar{X} >0$ and 0 if $\bar{X} \le 0$ . Similarly, $\hat\mu_0$ is $\bar{X}$ if $\bar{X} \le 0$ and 0 if $\bar{X} >0$ . Hence

$\displaystyle \frac{f_{\hat\theta_1}(X)}{f_{\hat\theta_0}(X)}= \exp\{\ell(\hat\mu_1) - \ell(\hat\mu_0)\}$

which simplifies to

$\displaystyle \exp\{n\bar{X}\vert\bar{X}\vert/2\}$

Monotone increasing function of $\bar{X}$ so rejection region will be of the form $\bar{X} > K$ . To get level $\alpha$ reject if $n^{1/2} \bar{X} > z_\alpha$ . Notice simpler statistic is log likelihood ratio

$\displaystyle \lambda \equiv 2\log\left(\frac{f_{\hat\mu_1}(X)}{f_{\hat\mu_0}(X)}\right) = n\bar{X}\vert\bar{X}\vert$

Example 2: In the $N(\mu,1)$ problem suppose we make the null $\mu=0$ . Then the value of $\hat\mu_0$ is simply 0 while the maximum of the log-likelihood over the alternative $\mu \neq 0$ occurs at $\bar{X}$ . This gives

$\displaystyle \lambda = n\bar{X}^2$

which has a $\chi^2_1$ distribution. This test leads to the rejection region $\lambda > (z_{\alpha/2})^2$ which is the usual UMPU test.

Example 3: For the $N(\mu,\sigma^2)$ problem testing $\mu=0$ against $\mu \neq 0$ we must find two estimates of $\mu,\sigma^2$ . The maximum of the likelihood over the alternative occurs at the global mle $\bar{X}, \hat\sigma^2$ . We find

$\displaystyle \ell(\hat\mu,\hat\sigma^2) = -n/2 - n \log(\hat\sigma)$

Maximize $\ell$ over null hypothesis. Recall

$\displaystyle \ell(\mu,\sigma) = -\frac{1}{2\sigma^2} \sum (X_i-\mu)^2 -n\log(\sigma)$

On null $\mu=0$ so find $\hat\sigma_0$ by maximizing

$\displaystyle \ell(0,\sigma) = -\frac{1}{2\sigma^2} \sum X_i^2 -n\log(\sigma)$

This leads to

$\displaystyle \hat\sigma_0^2 = \sum X_i^2/n$

and

$\displaystyle \ell(0,\hat\sigma_0) = -n/2 -n\log(\hat\sigma_0)$

This gives

$\displaystyle \lambda =-n\log(\hat\sigma^2/\hat\sigma_0^2)$

Since

$\displaystyle \frac{\hat\sigma^2}{\hat\sigma_0^2} = \frac{ \sum (X_i-\bar{X})^2}{ \sum (X_i-\bar{X})^2 + n\bar{X}^2}$

we can write

$\displaystyle \lambda = n \log(1+t^2/(n-1))$

where

$\displaystyle t = \frac{n^{1/2} \bar{X}}{s}$

is the usual

statistic. Likelihood ratio test rejects for large values of $\vert t\vert$ -- the usual test.

Notice that if is large we have

$\displaystyle \lambda \approx n[1+t^2/(n-1) +O(n^{-2})] \approx t^2 \, .$

Since the

statistic is approximately standard normal if

is large we see that

$\displaystyle \lambda = 2[\ell(\hat\theta_1) - \ell(\hat\theta_0)]$

has nearly a $\chi^2_1$ distribution.

This is a general phenomenon when the null hypothesis being tested is of the form $\phi=0$ . Here is the general theory. Suppose that the vector of parameters $\theta$ can be partitioned into $\theta=(\phi,\gamma)$ with $\phi$ a vector of parameters and $\gamma$ a vector of parameters. To test $\phi=\phi_0$ we find two mles of $\theta$ . First the global mle $\hat\theta = (\hat\phi,\hat\gamma)$ maximizes the likelihood over $\Theta_1=\{\theta:\phi\neq\phi_0\}$ (because typically the probability that $\hat\phi$ is exactly $\phi_0$ is 0).

Now we maximize the likelihood over the null hypothesis, that is we find $\hat\theta_0 = (\phi_0,\hat\gamma_0)$ to maximize

$\displaystyle \ell(\phi_0,\gamma)$

The log-likelihood ratio statistic is

$\displaystyle 2[\ell(\hat\theta)-\ell(\hat\theta_0)]$

Now suppose that the true value of $\theta$ is $\phi_0,\gamma_0$ (so that the null hypothesis is true). The score function is a vector of length and can be partitioned as $U=(U_\phi,U_\gamma)$ . The Fisher information matrix can be partitioned as

$\displaystyle \left[\begin{array}{cc} I_{\phi\phi} & I_{\phi\gamma} \\ I_{\gamma\phi} & I_{\gamma\gamma} \end{array}\right] \, .$

According to our large sample theory for the mle we have

$\displaystyle \hat\theta \approx \theta + I^{-1} U$

and

$\displaystyle \hat\gamma_0 \approx \gamma_0 + I_{\gamma\gamma}^{-1} U_\gamma$

If you carry out a two term Taylor expansion of both $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ you get

$\displaystyle \ell(\hat\theta) \approx \ell(\theta_0) + U^t I^{-1}U + \frac{1}{2} U^tI^{-1} V(\theta) I^{-1} U$

where

is the second derivative matrix of $\ell$ . Remember that $V \approx -I$ and you get

$\displaystyle 2[\ell(\hat\theta) - \ell(\theta_0)] \approx U^t I^{-1}U \, .$

A similar expansion for $\hat\theta_0$ gives

$\displaystyle 2[\ell(\hat\theta_0) -\ell(\theta_0)] \approx U_\gamma^t I_{\gamma\gamma}^{-1} U_\gamma \, .$

If you subtract these you find that

$\displaystyle 2[\ell(\hat\theta)-\ell(\hat\theta_0)]$

can be written in the approximate form

$\displaystyle U^tMU$

for a suitable matrix

. It is now possible to use the general theory of the distribution of

where

is $MVN(0,\Sigma)$ to demonstrate that

Theorem: The log-likelihood ratio statistic

$\displaystyle \lambda = 2[\ell(\hat\theta) - \ell(\hat\theta_0)]$

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Aside:

Theorem: Suppose $X\sim MVN(0,\Sigma)$ with $\Sigma$ non-singular and is a symmetric matrix. If $\Sigma M \Sigma M \Sigma = \Sigma M \Sigma$ then has a $\chi^2_\nu$ distribution with df $\nu=trace(M\Sigma)$ .

Proof: We have where $AA^t = \Sigma$ and is standard multivariate normal. So . Let . Since $AA^t = \Sigma$ condition in the theorem is

$\displaystyle A QQA^t = AQA^t$

Since $\Sigma$ is non-singular so is

. Multiply by $A^{-1}$ on left and $(A^t)^{-1}$ on right; get

is symmetric so $Q=P\Lambda P^t$ where $\Lambda$ is diagonal matrix containing the eigenvalues of and is orthogonal matrix whose columns are the corresponding orthonormal eigenvectors. So rewrite

$\displaystyle Z^t Q Z = (P^t Z)^t \Lambda (PZ)\, .$

; i.e.

is standard multivariate normal. Now

$\displaystyle W^t \Lambda W =\sum \lambda_i W_i^2$

We have established that the general distribution of any quadratic form is a linear combination of $\chi^2$ variables. Now go back to the condition . If $\lambda$ is an eigenvalue of and $v\neq 0$ is a corresponding eigenvector then $QQv = Q(\lambda v) = \lambda Qv = \lambda^2 v$ but also $QQv =Qv = \lambda v$ . Thus $\lambda(1-\lambda ) v=0$ . It follows that either $\lambda=0$ or $\lambda=1$ . This means that the weights in the linear combination are all 1 or 0 and that has a $\chi^2$ distribution with degrees of freedom, $\nu$ , equal to the number of $\lambda_i$ which are equal to 1. This is the same as the sum of the $\lambda_i$ so

$\displaystyle \nu = trace(\Lambda)$

But

$\displaystyle trace(M\Sigma)$	$\displaystyle = trace(MAA^t)$
	$\displaystyle = trace(A^t M A)$
	$\displaystyle = trace(Q)$
	$\displaystyle = trace (P\Lambda P^t)$
	$\displaystyle = trace(\Lambda P^t P)$
	$\displaystyle = trace(\Lambda)$