No Title

$next$ $up$ $previous$

STAT 450

Lecture 34

Goals for today:

Describe large sample theory for likelihood ratios
Introduce Confidence sets
Show how to get confidence sets from pivots.

Generalized likelihood ratio statistic:

$\begin{displaymath}\Lambda = \frac{L(\hat\theta_1)}{L(\hat\theta_0)} \end{displaymath}$

where $\hat\theta_i$ maximizes L over $\Theta_i$ . log likelihood ratio statistic

$\begin{displaymath}\lambda \equiv 2\log(\Lambda) \end{displaymath}$

If $\theta=(\phi,\gamma)$ and $H_o: \phi=\phi_0$ , $H_1: \phi \neq \phi_)$ then $\hat\theta-1=\hat\theta$ , the MLE.

$\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)] \end{displaymath}$

Example: : $N(\mu,\sigma)$ , $\phi=\mu, \gamma=\sigma$ .

$\begin{displaymath}\lambda= n\log(1+t^2/(n-1)) \end{displaymath}$

where

$\begin{displaymath}t = \frac{n^{1/2} \bar{X}}{s} \end{displaymath}$

is the usual t statistic. The likelihood ratio test thus rejects for large values of |t| which gives the usual test.

Notice: if n is large

$\begin{displaymath}\lambda \approx n[1+t^2/(n-1) +O(n^{-2})] \approx t^2 \, . \end{displaymath}$

Since t statistic is approximately N(0,1)if n large

$\begin{displaymath}\lambda = 2[\ell(\hat\theta_1) - \ell(\hat\theta_0)] \end{displaymath}$

has nearly a $\chi^2_1$ distribution.

General phenomenon: null hypothesis of form $\phi=0$ .

Suppose true value of $\theta$ is $\phi_0,\gamma_0$ (so H_o is true). Suppose $p=\dim(\phi)$ . Two term Taylor expansions of $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ can be used to prove

Theorem: The log-likelihood ratio statistic

$\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)] \end{displaymath}$

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Details: suppose that the true value of $\theta$ is $\phi_0,\gamma_0$ (so that the null hypothesis is true). The score function is a vector of length p+q and can be partitioned as $U=(U_\phi,U_\gamma)$ . The Fisher information matrix can be partitioned as

$\begin{displaymath}\left[\begin{array}{cc} I_{\phi\phi} & I_{\phi\gamma} \\ I_{\phi\gamma} & I_{\gamma\gamma} \end{array}\right] \, . \end{displaymath}$

According to our large sample theory for the mle we have

$\begin{displaymath}\hat\theta \approx \theta + I^{-1} U \end{displaymath}$

and

$\begin{displaymath}\hat\gamma_0 \approx \gamma_0 + I_{\gamma\gamma}^{-1} U_\gamma \end{displaymath}$

If you carry out a two term Taylor expansion of both $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ you get

$\begin{displaymath}\ell(\hat\theta) \approx \ell(\theta_0) + U^t I^{-1}U + \frac{1}{2} U^tI^{-1} V(\theta) I^{-1} U \end{displaymath}$

where V is the second derivative matrix of $\ell$ . Remember that $V \approx -I$ and you get

$\begin{displaymath}2[\ell(\hat\theta) - \ell(\theta_0)] \approx U^t I^{-1}U \, . \end{displaymath}$

A similar expansion for $\hat\theta_0$ gives

$\begin{displaymath}2[\ell(\hat\theta_0) -\ell(\theta_0)] \approx U_\gamma^t I_{\gamma\gamma}^{-1} U_\gamma \, . \end{displaymath}$

If you subtract these you find that

$\begin{displaymath}2[\ell(\hat\theta) - \ell(\hat\theta_0)] \end{displaymath}$

can be written in the approximate form

U^tMU

for a suitable matrix M. It is now possible to use the general theory of the distribution of X^t M X where X is $MVN(0,\Sigma)$ to demonstrate that

Theorem: The log-likelihood ratio statistic

$\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)] \end{displaymath}$

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Aside:

Theorem: Suppose that $X\sim MVN(0,\Sigma)$ with $\Sigma$ non-singular and Mis a symmetric matrix. If $\Sigma M \Sigma M \Sigma = \Sigma M \Sigma$ then X^t M X has a $\chi^2$ distribution with degrees of freedom $\nu=trace(M\Sigma)$ .

Proof: We have X=AZ where $AA^t = \Sigma$ and Z is standard multivariate normal. So X^t M X = Z^t A^t M A Z. Let Q=A^t M A. Since $AA^t = \Sigma$ the condition in the theorem is actually

A QQA^t = AQA^t

Since $\Sigma$ is non-singular so is A. Multiply by A^-1on the left and (A^t)^-1 on the right to discover QQ=Q.

The matrix Q is symmetric and so can be written in the form $P\Lambda P^t$ where $\Lambda$ is a diagonal matrix containing the eigenvalues of Q and P is an orthogonal matrix whose columns are the corresponding orthonormal eigenvectors. It follows that we can rewrite

$\begin{displaymath}Z^t Q Z = (P^t Z)^t \Lambda (PZ) \end{displaymath}$

The variable W = P^t Z is multivariate normal with mean 0 and variance covariance matrix P^t P = I; that is, W is standard multivariate normal. Now

$\begin{displaymath}W^t \Lambda W =\sum \lambda_i W_i^2 \end{displaymath}$

We have established that the general distribution of any quadratic form X^t M X is a linear combination of $\chi^2$ variables. Now go back to the condition QQ=Q. If $\lambda$ is an eigenvalue of Q and $v\neq 0$ is a corresponding eigenvector then $QQv = Q(\lambda v) = \lambda Qv = \lambda^2 v$ but also $QQv =Qv = \lambda v$ . Thus $\lambda(1-\lambda ) v=0$ . It follows that either $\lambda=0$ or $\lambda=1$ . This means that the weights in the linear combination are all 1 or 0 and that X^t M X has a $\chi^2$ distribution with degrees of freedom, $\nu$ , equal to the number of $\lambda_i$ which are equal to 1. This is the same as the sum of the $\lambda_i$ so

$\begin{displaymath}\nu = trace(\Lambda) \end{displaymath}$

But
$\begin{align*}trace(M\Sigma)& = trace(MAA^t) \\ &= trace(A^t M A) \\ & = trace... ... (P\Lambda P^t) \\ & = trace(\Lambda P^t P) \\ & = trace(\Lambda) \end{align*}$

In the application $\Sigma$ is ${\cal I}$ the Fisher information and $M={\cal I}^{-1} - J$ where

$\begin{displaymath}J= \left[\begin{array}{cc} 0 & 0 \\ 0 & I_{\gamma\gamma}^{-1} \end{array}\right] \end{displaymath}$

It is easy to check that $M\Sigma$ becomes

$\begin{displaymath}\left[\begin{array}{cc} I & 0 \\ 0 & 0 \end{array}\right] \end{displaymath}$

where I is a $p\times p$ identity matrix. It follows that $M\Sigma M\Sigma= M\Sigma$ and that $trace(M\Sigma) = p)$ .

Confidence Sets

A level $\beta$ confidence set for a parameter $\phi(\theta)$ is a random subset C, of the set of possible values of $\phi$ such that for each $\theta$ we have

$\begin{displaymath}P_\theta(\phi(\theta) \in C) \ge \beta \end{displaymath}$

Confidence sets are very closely connected with hypothesis tests:

Suppose C is a level $\beta=1-\alpha$ confidence set for $\phi$ . To test $\phi=\phi_0$ we consider the test which rejects if $\phi\not\in C$ . This test has level $\alpha$ . Conversely, suppose that for each $\phi_0$ we have available a level $\alpha$ test of $\phi=\phi_0$ who rejection region is say $R_{\phi_0}$ . Then if we define $C=\{\phi_0: \phi=\phi_0 \mbox{ is not rejected}\}$ we get a level $1-\alpha$ confidence for $\phi$ . The usual t test gives rise in this way to the usual t confidence intervals

$\begin{displaymath}\bar{X} \pm t_{n-1,\alpha/2} \frac{s}{\sqrt{n}} \end{displaymath}$

which you know well.

Confidence sets from Pivots

Definition: A pivot (or pivotal quantity) is a function $g(\theta,X)$ whose distribution is the same for all $\theta$ . (As usual the $\theta$ in the pivot is the same $\theta$ as the one being used to calculate the distribution of $g(\theta,X)$ .

Pivots can be used to generate confidence sets as follows. Pick a set A in the space of possible values for g. Let $\beta=P_\theta(g(\theta,X) \in A)$ ; since g is pivotal $\beta$ is the same for all $\theta$ . Now given a data set X solve the relation

$\begin{displaymath}g(\theta,X) \in A \end{displaymath}$

to get

$\begin{displaymath}\theta \in C(X,A) \, . \end{displaymath}$

Example: The quantity

$\begin{displaymath}(n-1) s^2/\sigma^2 \end{displaymath}$

is a pivot in the $N(\mu,\sigma^2)$ model. It has a $\chi_{n-1}^2$ distribution. Given $\beta=1-\alpha$ consider the two points $\chi_{n-1,1-\alpha/2}^2$ and $\chi_{n-1,\alpha/2}^2$ . Then

$\begin{displaymath}P(\chi_{n-1,1-\alpha/2}^2 \le (n-1) s^2/\sigma^2 \le \chi_{n-1,\alpha/2}^2) = \beta \end{displaymath}$

for all $\mu,\sigma$ . We can solve this relation to get

$\begin{displaymath}P( \frac{(n-1)^{1/2} s}{ \chi_{n-1,\alpha/2}} \le \sigma \le \frac{(n-1)^{1/2} s}{ \chi_{n-1,1-\alpha/2}}) = \beta \end{displaymath}$

so that the interval from $(n-1)^{1/2} s/\chi_{n-1,\alpha/2}$ to $(n-1)^{1/2} s/\chi_{n-1,1-\alpha/2}$ is a level $1-\alpha$ confidence interval.

In the same model we also have

$\begin{displaymath}P(\chi_{n-1,1-\alpha}^2 \le (n-1) s^2/\sigma^2 ) = \beta \end{displaymath}$

which can be solved to get

$\begin{displaymath}P(\sigma \le \frac{(n-1)^{1/2} s}{ \chi_{n-1,1-\alpha/2}}) = \beta \end{displaymath}$

This gives a level $1-\alpha$ interval $(0,(n-1)^{1/2} s/\chi_{n-1,1-\alpha})$ . The right hand end of this interval is usually called a confidence upper bound.

In general the interval from $(n-1)^{1/2} s/\chi_{n-1,\alpha_1}$ to $(n-1)^{1/2} s/\chi_{n-1,1-\alpha_2}$ has level $\beta = 1 -\alpha_1-\alpha_2$ . For a fixed value of $\beta$ we can minimize the length of the resulting interval numerically. This sort of optimization is rarely used.

Likelihood ratio intervals

The quantity

$\begin{displaymath}\lambda(\theta) = 2[\ell(\hat\theta) -\ell(\hat\theta_0)] \end{displaymath}$

is an approximate pivot. The notation is as in the hypothesis testing problem $H_o: \phi=\phi_0$ . To get a CI for $\phi$ you compute $\hat\psi(\phi)$ by maximizing $L(\phi,\psi)$ over $\psi$ , holding $\phi$ fixed. Then $\hat\theta_0$ will be $\phi,\hat\psi(\phi)$ . You get a level $1-\alpha$ confidence interval by taking

$\begin{displaymath}\{\phi: 2[\ell(\hat\phi,\hat\psi) - \ell( \phi,\hat\psi(\phi))] \le \chi^2_{p,\alpha}\} \end{displaymath}$

(This might not be an interval.)

Example: : Sample from $\lambda e^{-\lambda}$ . Now $\phi=\lambda$ , no $\psi$ .

$\begin{displaymath}\ell(\lambda) = -\lambda\sum X_i+n\log(\lambda) \end{displaymath}$

is maximized at $\hat\lambda = 1/\bar{X}$ . Then

$\begin{displaymath}2[\ell(\hat\lambda)-\ell(\lambda)] =2[-n -n\log(\bar{X}) -n\lambda\bar{X} +n\log(\lambda)] \end{displaymath}$

Then we get a CI for $\lambda$ by taking all the $\lambda$ for which this quantity is less than say $(1.96)^2 = 3.84=\chi^2_{1,0.05}$ .

$next$ $up$ $previous$

Richard Lockhart
1999-12-01