next up previous


STAT 450

Lecture 34

Goals for today:

Generalized likelihood ratio statistic:

\begin{displaymath}\Lambda = \frac{L(\hat\theta_1)}{L(\hat\theta_0)}
\end{displaymath}

where $\hat\theta_i$ maximizes L over $\Theta_i$. log likelihood ratio statistic

\begin{displaymath}\lambda \equiv 2\log(\Lambda)
\end{displaymath}

If $\theta=(\phi,\gamma)$ and $H_o: \phi=\phi_0$, $H_1: \phi \neq \phi_)$then $\hat\theta-1=\hat\theta$, the MLE.

\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)]
\end{displaymath}

Example: : $N(\mu,\sigma)$, $\phi=\mu, \gamma=\sigma$.

\begin{displaymath}\lambda= n\log(1+t^2/(n-1))
\end{displaymath}

where

\begin{displaymath}t = \frac{n^{1/2} \bar{X}}{s}
\end{displaymath}

is the usual t statistic. The likelihood ratio test thus rejects for large values of |t| which gives the usual test.

Notice: if n is large

\begin{displaymath}\lambda \approx n[1+t^2/(n-1) +O(n^{-2})] \approx t^2 \, .
\end{displaymath}

Since t statistic is approximately N(0,1)if n large

\begin{displaymath}\lambda = 2[\ell(\hat\theta_1) - \ell(\hat\theta_0)]
\end{displaymath}

has nearly a $\chi^2_1$ distribution.

General phenomenon: null hypothesis of form $\phi=0$.

Suppose true value of $\theta$ is $\phi_0,\gamma_0$(so Ho is true). Suppose $p=\dim(\phi)$. Two term Taylor expansions of $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ can be used to prove

Theorem: The log-likelihood ratio statistic

\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)]
\end{displaymath}

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Details: suppose that the true value of $\theta$ is $\phi_0,\gamma_0$(so that the null hypothesis is true). The score function is a vector of length p+q and can be partitioned as $U=(U_\phi,U_\gamma)$. The Fisher information matrix can be partitioned as

\begin{displaymath}\left[\begin{array}{cc}
I_{\phi\phi} & I_{\phi\gamma}
\\
I_{\phi\gamma} & I_{\gamma\gamma}
\end{array}\right] \, .
\end{displaymath}

According to our large sample theory for the mle we have

\begin{displaymath}\hat\theta \approx \theta + I^{-1} U
\end{displaymath}

and

\begin{displaymath}\hat\gamma_0 \approx \gamma_0 + I_{\gamma\gamma}^{-1} U_\gamma
\end{displaymath}

If you carry out a two term Taylor expansion of both $\ell(\hat\theta)$ and $\ell(\hat\theta_0)$ around $\theta_0$ you get

\begin{displaymath}\ell(\hat\theta) \approx \ell(\theta_0) + U^t I^{-1}U + \frac{1}{2}
U^tI^{-1} V(\theta) I^{-1} U
\end{displaymath}

where V is the second derivative matrix of $\ell$. Remember that $V \approx -I$ and you get

\begin{displaymath}2[\ell(\hat\theta) - \ell(\theta_0)] \approx U^t I^{-1}U \, .
\end{displaymath}

A similar expansion for $\hat\theta_0$ gives

\begin{displaymath}2[\ell(\hat\theta_0) -\ell(\theta_0)] \approx U_\gamma^t I_{\gamma\gamma}^{-1}
U_\gamma \, .
\end{displaymath}

If you subtract these you find that

\begin{displaymath}2[\ell(\hat\theta) - \ell(\hat\theta_0)]
\end{displaymath}

can be written in the approximate form

UtMU

for a suitable matrix M. It is now possible to use the general theory of the distribution of Xt M X where X is $MVN(0,\Sigma)$ to demonstrate that

Theorem: The log-likelihood ratio statistic

\begin{displaymath}\lambda = 2[\ell(\hat\theta)-\ell(\hat\theta_0)]
\end{displaymath}

has, under the null hypothesis, approximately a $\chi_p^2$ distribution.

Aside:

Theorem: Suppose that $X\sim MVN(0,\Sigma)$ with $\Sigma$non-singular and Mis a symmetric matrix. If $\Sigma M \Sigma M \Sigma = \Sigma M \Sigma$then Xt M X has a $\chi^2$ distribution with degrees of freedom $\nu=trace(M\Sigma)$.

Proof: We have X=AZ where $AA^t = \Sigma$ and Z is standard multivariate normal. So Xt M X = Zt At M A Z. Let Q=At M A. Since $AA^t = \Sigma$ the condition in the theorem is actually

A QQAt = AQAt

Since $\Sigma$ is non-singular so is A. Multiply by A-1on the left and (At)-1 on the right to discover QQ=Q.

The matrix Q is symmetric and so can be written in the form $P\Lambda P^t$ where $\Lambda$ is a diagonal matrix containing the eigenvalues of Q and P is an orthogonal matrix whose columns are the corresponding orthonormal eigenvectors. It follows that we can rewrite

\begin{displaymath}Z^t Q Z = (P^t Z)^t \Lambda (PZ)
\end{displaymath}

The variable W = Pt Z is multivariate normal with mean 0 and variance covariance matrix Pt P = I; that is, W is standard multivariate normal. Now

\begin{displaymath}W^t \Lambda W =\sum \lambda_i W_i^2
\end{displaymath}

We have established that the general distribution of any quadratic form Xt M X is a linear combination of $\chi^2$ variables. Now go back to the condition QQ=Q. If $\lambda$ is an eigenvalue of Q and $v\neq 0$ is a corresponding eigenvector then $QQv = Q(\lambda v) = \lambda Qv = \lambda^2 v
$ but also $QQv =Qv = \lambda v$. Thus $\lambda(1-\lambda ) v=0$. It follows that either $\lambda=0$ or $\lambda=1$. This means that the weights in the linear combination are all 1 or 0 and that Xt M X has a $\chi^2$ distribution with degrees of freedom, $\nu$, equal to the number of $\lambda_i$ which are equal to 1. This is the same as the sum of the $\lambda_i$ so

\begin{displaymath}\nu = trace(\Lambda)
\end{displaymath}

But
\begin{align*}trace(M\Sigma)& = trace(MAA^t)
\\
&= trace(A^t M A)
\\
& = trace...
... (P\Lambda P^t)
\\
& = trace(\Lambda P^t P)
\\
& = trace(\Lambda)
\end{align*}

In the application $\Sigma$ is ${\cal I}$ the Fisher information and $M={\cal I}^{-1} - J$ where

\begin{displaymath}J= \left[\begin{array}{cc}
0 & 0 \\ 0 & I_{\gamma\gamma}^{-1}
\end{array}\right]
\end{displaymath}

It is easy to check that $M\Sigma$ becomes

\begin{displaymath}\left[\begin{array}{cc}
I & 0 \\ 0 & 0
\end{array}\right]
\end{displaymath}

where I is a $p\times p$ identity matrix. It follows that $M\Sigma M\Sigma= M\Sigma$ and that $trace(M\Sigma) = p)$.

Confidence Sets

A level $\beta$ confidence set for a parameter $\phi(\theta)$is a random subset C, of the set of possible values of $\phi$ such that for each $\theta$ we have

\begin{displaymath}P_\theta(\phi(\theta) \in C) \ge \beta
\end{displaymath}

Confidence sets are very closely connected with hypothesis tests:

Suppose C is a level $\beta=1-\alpha$ confidence set for $\phi$. To test $\phi=\phi_0$ we consider the test which rejects if $\phi\not\in C$. This test has level $\alpha$. Conversely, suppose that for each $\phi_0$ we have available a level $\alpha$ test of $\phi=\phi_0$ who rejection region is say $R_{\phi_0}$. Then if we define $C=\{\phi_0: \phi=\phi_0 \mbox{ is not rejected}\}$ we get a level $1-\alpha$ confidence for $\phi$. The usual t test gives rise in this way to the usual t confidence intervals

\begin{displaymath}\bar{X} \pm t_{n-1,\alpha/2} \frac{s}{\sqrt{n}}
\end{displaymath}

which you know well.

Confidence sets from Pivots

Definition: A pivot (or pivotal quantity) is a function $g(\theta,X)$ whose distribution is the same for all $\theta$. (As usual the $\theta$ in the pivot is the same $\theta$ as the one being used to calculate the distribution of $g(\theta,X)$.

Pivots can be used to generate confidence sets as follows. Pick a set A in the space of possible values for g. Let $\beta=P_\theta(g(\theta,X) \in A)$; since g is pivotal $\beta$ is the same for all $\theta$. Now given a data set X solve the relation

\begin{displaymath}g(\theta,X) \in A
\end{displaymath}

to get

\begin{displaymath}\theta \in C(X,A) \, .
\end{displaymath}

Example: The quantity

\begin{displaymath}(n-1) s^2/\sigma^2
\end{displaymath}

is a pivot in the $N(\mu,\sigma^2)$ model. It has a $\chi_{n-1}^2$distribution. Given $\beta=1-\alpha$ consider the two points $\chi_{n-1,1-\alpha/2}^2$ and $\chi_{n-1,\alpha/2}^2$. Then

\begin{displaymath}P(\chi_{n-1,1-\alpha/2}^2 \le (n-1) s^2/\sigma^2 \le \chi_{n-1,\alpha/2}^2) = \beta
\end{displaymath}

for all $\mu,\sigma$. We can solve this relation to get

\begin{displaymath}P( \frac{(n-1)^{1/2} s}{ \chi_{n-1,\alpha/2}} \le \sigma \le \frac{(n-1)^{1/2} s}{
\chi_{n-1,1-\alpha/2}}) = \beta
\end{displaymath}

so that the interval from $(n-1)^{1/2} s/\chi_{n-1,\alpha/2}$ to $(n-1)^{1/2} s/\chi_{n-1,1-\alpha/2}$ is a level $1-\alpha$ confidence interval.

In the same model we also have

\begin{displaymath}P(\chi_{n-1,1-\alpha}^2 \le (n-1) s^2/\sigma^2 ) = \beta
\end{displaymath}

which can be solved to get

\begin{displaymath}P(\sigma \le \frac{(n-1)^{1/2} s}{
\chi_{n-1,1-\alpha/2}}) = \beta
\end{displaymath}

This gives a level $1-\alpha$ interval $(0,(n-1)^{1/2} s/\chi_{n-1,1-\alpha})$. The right hand end of this interval is usually called a confidence upper bound.

In general the interval from $(n-1)^{1/2} s/\chi_{n-1,\alpha_1}$ to $(n-1)^{1/2} s/\chi_{n-1,1-\alpha_2}$has level $\beta = 1 -\alpha_1-\alpha_2$. For a fixed value of $\beta$ we can minimize the length of the resulting interval numerically. This sort of optimization is rarely used.

Likelihood ratio intervals

The quantity

\begin{displaymath}\lambda(\theta) = 2[\ell(\hat\theta) -\ell(\hat\theta_0)]
\end{displaymath}

is an approximate pivot. The notation is as in the hypothesis testing problem $H_o: \phi=\phi_0$. To get a CI for $\phi$you compute $\hat\psi(\phi)$ by maximizing $L(\phi,\psi)$ over $\psi$, holding $\phi$ fixed. Then $\hat\theta_0$ will be $\phi,\hat\psi(\phi)$. You get a level $1-\alpha$ confidence interval by taking

\begin{displaymath}\{\phi: 2[\ell(\hat\phi,\hat\psi) - \ell( \phi,\hat\psi(\phi))] \le \chi^2_{p,\alpha}\}
\end{displaymath}

(This might not be an interval.)

Example: : Sample from $\lambda e^{-\lambda}$. Now $\phi=\lambda$, no $\psi$.

\begin{displaymath}\ell(\lambda) = -\lambda\sum X_i+n\log(\lambda)
\end{displaymath}

is maximized at $\hat\lambda = 1/\bar{X}$. Then

\begin{displaymath}2[\ell(\hat\lambda)-\ell(\lambda)] =2[-n -n\log(\bar{X}) -n\lambda\bar{X} +n\log(\lambda)]
\end{displaymath}

Then we get a CI for $\lambda$ by taking all the $\lambda$ for which this quantity is less than say $(1.96)^2 = 3.84=\chi^2_{1,0.05}$.


next up previous



Richard Lockhart
1999-12-01