No Title

$next$ $up$ $previous$

Postscript version of these notes

STAT 450

Lecture 19

Today's notes

Reading for Today's Lecture:

Goals of Today's Lecture:

Define the Fisher information matrix.
Illustrate standard MLE theory on $N(\mu,\sigma^2)$ example.

Today's notes

So far: we have shown that for $X_1,\ldots,X_n$ an iid sample from $N(\mu,\sigma^2)$ we have

$\begin{displaymath}B \equiv \text{Var}(U(X_1,\ldots,X_n;\mu,\sigma)) = \left[\be... ...}{\sigma^2} & 0 \\ 0 & \frac{2n}{\sigma^2} \end{array}\right] \end{displaymath}$

Next: compute $A=-\text{E}(\partial U/ \partial\theta)$ . First
$\begin{align*}\frac{\partial U(\mu,\sigma)}{\partial \theta} & = \left[ \begin{a... ...3\sum(X_i-\mu)^2}{\sigma^4} + \frac{n}{\sigma^2} \end{array}\right] \end{align*}$
Take expected values to get

$\begin{displaymath}A = \left[\begin{array}{cc} \frac{n}{\sigma^2} & 0 \\ 0 & \frac{2n}{\sigma^2} \end{array}\right] = B \end{displaymath}$

Definition: The Fisher Information is

$\begin{displaymath}{\cal I}_n(\theta)=-E_{\theta}(U^\prime(\theta)). \end{displaymath}$

General properties of likelihoods

In general if $f(x,\theta)$ denotes the joint density of all the data then

$\begin{displaymath}1 = \int f(x;\theta) dx \end{displaymath}$

for all $\theta$ .

This is an identity in $\theta$ , so we can differentiate both sides with respect to $\theta$ to get
$\begin{align*}0 & = \frac{\partial}{\partial\theta} \int f(x;\theta) dx \\ & =... ...a)}{f(x;\theta)} f(x;\theta) dx \\ & = \text{E}_\theta(U(\theta)) \end{align*}$

The calculation above involves changing the order of differentiation and integration and is not always valid. In the irregular examples on your homework this doesn't work. It does work in the normal example I did; this is the usual outcome.

Differentiate the same identity again and get
$\begin{align*}0 & = \frac{\partial}{\partial\theta} \int\frac{\frac{\partial}{\p... ...{\partial\theta} f(x;\theta)}{f(x;\theta)}\right)^2 f(x;\theta) dx \end{align*}$

This gives a so-called Bartlett identity

$\begin{displaymath}\text{E}_\theta ( U(\theta)^2) = \text{Var}_\theta(U(\theta)) = - \text{E}_\theta(\frac{\partial U}{\partial\theta}) \end{displaymath}$

In the notation of the previous example and theory it is always the case for likelihood estimates (in ``regular'' models) that A=B and $A^{-1}BA^{-1} = {\cal I}^{-1}(\theta)$ .

Asymptotic normality

In the normal example we had an iid sampling model. The Fisher information when we have n observations was

$\begin{displaymath}{\cal I}_n(\theta) = n \left[\begin{array}{cc} \sigma^{-2} & 0 \\ 0 & 2\sigma^{-2} \end{array}\right] \end{displaymath}$

Notice that ${\cal I}_n(\theta) = n {\cal I}_1(\theta)$ . We refer to ${\cal I}_1(\theta_0)$ as the information in 1 observation. In any iid sampling problem we will have ${\cal I}_n(\theta) = n {\cal I}_1(\theta)$ .

This leads to the following theorem.

Theorem: In iid sampling

$\begin{displaymath}\sqrt{n}(\hat\theta - \theta) \Rightarrow N(0,{\cal I}_1^{-1}(\theta) \end{displaymath}$

Examples, then uses

A: Exponential rate $\lambda$ . We have $X-1,\ldots,X_n$ iid with density $f(x,\lambda) = \lambda e^{-\lambda x} 1(x>0)$ . We find
$\begin{align*}L(\lambda) & = \lambda^n e^{-\lambda \sum X_i} \\ \ell(\lambda) &... ...\ \frac{ \sqrt{n}(\hat\lambda -\lambda)}{\lambda} & \approx N(0,1) \end{align*}$

B: Exponential mean $\mu$ . We have $X_1,\ldots,X_n$ iid with density $f(x,\mu) = \frac{1}{\mu} e^{-x/\mu } 1(x>0)$ . We find
$\begin{align*}L(\mu) & = \frac{1}{\mu^n} e^{- \sum X_i/ \mu} \\ \ell(\mu) & = -... ...\ \frac{ \sqrt{n}(\hat\lambda -\lambda)}{\lambda} & \approx N(0,1) \end{align*}$

C: Cauchy with location parameter $\theta$ . We have $X_1,\ldots,X_n$ iid with density $f(x,\theta) = \frac{1}{\pi[1+(x-\theta)^2]}$ . We find
$\begin{align*}L(\theta) & = \frac{1}{\pi^n\prod[1+(X_i-\theta)^2]} \\ \ell(\th... ...ta & \sim N(0,2/n) \\ \sqrt{n/2}(\hat\theta -\theta) & \sim N(0,1) \end{align*}$

D: Uniform $[0,\theta]$ We have $X_1,\ldots,X_n$ iid with density $f(x,\theta) = \frac{1}{\theta}1(0 \le x \le \theta)$ . We find
$\begin{align*}% latex2html id marker 255 L(\theta) & = \frac{1}{\theta^n} 1(\the... ...ymptotically normal} \\ \bullet & \text{Family is {\bf irregular}} \end{align*}$

This family has the feature that the support of the density, namely $\{x:f(x;\theta) > 0 \}$ , depends on $\theta$ . In such families it is common for the standard mle theory to fail.

Uses and extensions

Confidence Intervals: We can base confidence intervals on one of several forms. For this section I will assume that $\theta$ is a scalar (one dimensional) parameter and use ${}^\prime$ to denote a derivative with respect to the parameter. There are 3 standard versions of the normal approximation:

$\begin{align*}\sqrt{{\cal I}_n(\theta)}(\hat\theta - \theta) \Rightarrow N(0,1) ... ...sqrt{-U^\prime(\hat\theta)}(\hat\theta - \theta) \Rightarrow N(0,1) \end{align*}$
Each of these quantities may be used to derive confidence intervals for $\theta$ by finding the collection of all $\theta$ for which the quantity is smaller than some critical point.

The second and third quantities are of the form

$\begin{displaymath}\frac{\hat\theta -\theta}{\hat\sigma_{\hat\theta}} \end{displaymath}$

If such a quantity is standard normal then

$\begin{displaymath}P(\left\vert\frac{\hat\theta -\theta}{\hat\sigma_{\hat\theta}}\right\vert \le z_{\alpha/2}) \approx 1-\alpha \end{displaymath}$

$\begin{displaymath}P(\hat\theta - \hat\sigma_{\hat\theta} z_{\alpha/2} \le \thet... ...theta + \hat\sigma_{\hat\theta} z_{\alpha/2}) \approx 1-\alpha \end{displaymath}$

This means that

$\begin{displaymath}\hat\theta \pm \hat\sigma_{\hat\theta} z_{\alpha/2} \end{displaymath}$

is an approximate level $1-\alpha$ confidence interval for $\theta$ .

$next$ $up$ $previous$

Richard Lockhart
1999-10-26