next up previous


Postscript version of this file

STAT 870 Lecture 7

Consistency of MLE

Cauchy problem: for vector $(x_1,\ldots,x_n)$we define $h_n(x_1,\ldots,x_n)$ to be that root of

\begin{displaymath}\ell^\prime(\theta\vert x_1,\ldots,x_n)
= \sum\frac{2(x_i-\theta)}{1+(x_i-\theta)^2} =0
\end{displaymath}

which is closest to $g_n(x_1,\ldots,x_n)$ (here gn is the Borel function used in defining the median above). If $g_n(x_1,\ldots,x_n)$ is exactly midway between two roots which are tied for closest we define hnto be the root smaller than gn.

It is possible to prove that this defines a Borel function from ${\Bbb R}^n$ to ${\Bbb R}$. Now define

\begin{displaymath}\hat\theta_n = h_n(X_1,\ldots,X_n)
\end{displaymath}

I claim that if $X_1,X_2,\ldots$ are iid Cauchy(0) then

 \begin{displaymath}
\hat\theta_n \to 0
\end{displaymath} (1)

almost surely.

To prove this fix $\epsilon > 0$ we prove that $P(C_\epsilon)=1$ where

\begin{displaymath}C_\epsilon= \bigcup_{N=1}^\infty \bigcap_{n=N}^\infty
\{\vert\hat\theta_n\vert \le \epsilon\}
\end{displaymath}

Notation:

Find event $D_\epsilon$ inside $C_\epsilon$ with $P(D_\epsilon)=1$. To define this new event note that if

1.
$\ell^\prime$ has a unique root over $[-3\epsilon,3\epsilon]$and

2.
that root is actually in $[-\epsilon,\epsilon]$ and

3.
$\vert\tilde\theta_n\vert \le \epsilon$
then the root of $\ell^\prime$ closest to $\tilde\theta_n$ is actually the root in points 1 and 2 and so

\begin{displaymath}\vert\hat\theta_n\vert \le \epsilon
\end{displaymath}

Define $ D^{(1)}_\epsilon $ to be the event that there is an N such that for all $n \ge N$ and all $\vert\theta\vert\le 2\epsilon$we have

\begin{displaymath}\ell^{\prime\prime}(\theta\vert X_1,\ldots,X_n) < 0
\end{displaymath}

Define $ D^{(2)}_\epsilon$ to be the event that there is an N such that for all $n \ge N$

\begin{displaymath}\ell^{\prime\prime}(\epsilon\vert X_1,\ldots,X_n) < 0
\end{displaymath}

and

\begin{displaymath}\ell^{\prime\prime}(-\epsilon\vert X_1,\ldots,X_n) > 0
\end{displaymath}

Finally define $ D^{(3)}_\epsilon$ to be the event that there is an N such that for all $n \ge N$

\begin{displaymath}\vert\tilde\theta_n\vert \le \epsilon
\end{displaymath}

Already shown: $P(D^{(3)}_\epsilon)=1$.

Next show $P( D^{(2)}_\epsilon)=1$. Note that

\begin{displaymath}U_k(\epsilon) = \frac{2(X_k-\epsilon)}{1+(X_k-\epsilon)^2}
\end{displaymath}

and

\begin{displaymath}U_k(-\epsilon) = \frac{2(X_k+\epsilon)}{1+(X_k+\epsilon)^2}
\end{displaymath}

Then

\begin{displaymath}\frac{1}{n}\ell^{\prime\prime}(\epsilon\vert X_1,\ldots,X_n)
=\overline{U_n(\epsilon)}
\end{displaymath}

and

\begin{displaymath}\frac{1}{n}\ell^{\prime\prime}(-\epsilon\vert X_1,\ldots,X_n)
=\overline{U_n(-\epsilon)}
\end{displaymath}

Thus $ D^{(2)}_\epsilon$ is the event that there is an N such that for all $n \ge N$

\begin{displaymath}\overline{U_n(\epsilon)} < 0 \quad \text{and}
\quad \overline{U_n(-\epsilon)} > 0
\end{displaymath}

$\overline{U_n(-\epsilon)}$, $\overline{U_n(\epsilon)}$ averages of iid variates. Apply SLLN and show
\begin{align*}{\rm E}(U_k(\epsilon)) & < 0
\\
{\rm E}(U_k(-\epsilon)) & > 0 \, .
\end{align*}
In fact
\begin{align*}{\rm E}(U_k(\epsilon)) & = \frac{-2\pi\epsilon}{\epsilon^2+4} < 0
...
...m E}(U_k(-\epsilon)) & = \frac{2\pi\epsilon}{\epsilon^2+4} > 0 \, .
\end{align*}

Defect: argument not easy to generalize; uses exact computation of moment.

More general tactic: use Jensen's inequality.

If $\ell$ is smaller at $-\epsilon$ and at $\epsilon$ than it is at 0 then there must be a critical point in $[-\epsilon,\epsilon]$, that is, a root of $\ell^\prime$.

Define

\begin{displaymath}L_i(\theta) = \log(1+X_i^2)-\log(1+(X_i-\theta)^2) \, .
\end{displaymath}

Note that $\ell(\theta)-\ell(0) = \sum L_i(\theta)$.

$ D^{(4)}_\epsilon $ is the event $\exists N$ such that $\forall n \ge N$

\begin{displaymath}\{ \sum L_i(\epsilon) < 0, \sum L_i(-\epsilon) <0\} \, .
\end{displaymath}

Define

\begin{displaymath}\mu(\epsilon) = {\rm E}(L_i(\epsilon))
\end{displaymath}

SLLN shows $P(D^{(5)}_\epsilon)=1$where $
D^{(5)}_\epsilon$is the event

\begin{displaymath}\{ \sum L_i(\epsilon) / n \to \mu(\epsilon),
\sum L_i(-\epsilon)/n \to \mu(-\epsilon)\}
\end{displaymath}

Claim: for all $\epsilon\neq 0$ $\mu(\epsilon)< 0$.

If so then

\begin{displaymath}D^{(5)}_\epsilon \subset D^{(4)}_\epsilon
\end{displaymath}

and so $P(D^{(4)}_\epsilon )=1$.

To prove the claim we apply Jensen's inequality:

Proposition 1 (Jensen)   Suppose Y is a random variable and $\phi$ is a function which is convex on an interval (a,b) with P(a<Y<b)=1Assume ${\rm E}(\vert Y\vert)< \infty$. Then

\begin{displaymath}\phi({\rm E}(Y)) \le {\rm E}(phi(Y))
\end{displaymath}

If $\phi$ is strictly convex then the inequality is strict unless ${\rm Var}(Y)=0$.

Jargon: $\phi$ is convex if for each x,y and $\lambda\in (0,1)$

\begin{displaymath}\phi(\lambda x +(1-\lambda)y) \le \lambda \phi(x) +(1-\lambda)\phi(y)
\end{displaymath}

We call $\phi$ strictly convex if the inequality is strict.

If $\phi$ is twice differentiable and $\phi^{\prime\prime} \ge 0$then $\phi$ is convex; a strict inequality shows $\phi$ is strictly convex.

Apply Jensen's inequality with $\phi(x) = - \log(x)$ to Y=g(X)/f(X) where X has density f and g is some other density. Then

\begin{displaymath}{\rm E}\{-\log(Y)\} >
-\log\{{\rm E}(Y)\}
\end{displaymath}

But the latter is

\begin{displaymath}\log\left\{\int \frac{g(x)}{f(x)} f(x) dx\right\} =\log(1) = 0
\end{displaymath}

Technically: interval (a,b) is $(0,\infty)$. The assumption

\begin{displaymath}P(0 < Y < \infty)=1
\end{displaymath}

deserves some discussion. If f(x)=0 for some places where g(x) is not 0 then

\begin{displaymath}{\rm E}\left\{\frac{g(X)}{f(X)}\right\} = \int g(x)1(f(X)>0) dx
\end{displaymath}

which might be less than 1. This just makes the inequality stronger, however.

The other technical detail is that g(x) might be 0 some places where f(x) is not 0. This might mean P(Y=0) > 0. On the event Y=0we will agree to take $-\log(Y)=\infty$ and conclude

\begin{displaymath}{\rm E}\{-log(Y)\}=\infty
\end{displaymath}

In any case we find

\begin{displaymath}{\rm E}\{-log(Y)\} > 0
\end{displaymath}

or

\begin{displaymath}{\rm E}[\log\{g(X)\} - \log\{f(X)\}] < 0 \, .
\end{displaymath}

Applied to our Cauchy problem we have shown $\mu(\theta) < 0$for all $\theta\neq 0$. Hence $P(D^{(4)}_\epsilon )=1$.

Finally we consider $ D^{(1)}_\epsilon $. Up to now we have been able to make do with an arbitrary $\epsilon$. In this case, however, the result holds only for small $\epsilon > 0$. First consider

\begin{displaymath}\frac{1}{n} \ell^{\prime\prime}(0\vert X_1,\ldots,X_n)
\end{displaymath}

According to the strong law of large numbers this converges almost surely to

\begin{displaymath}{\rm E}\left\{\frac{2(1-X^2)}{(1+X^2)^2}\right\} =-\frac{1}{2} < 0
\end{displaymath}

Now you can check that

\begin{displaymath}\left\vert \frac{1}{n} \ell^{\prime\prime\prime}(\theta\vert X_1,\ldots,X_n)
\right\vert < 4 \, .
\end{displaymath}

(In fact each term in $\ell^{\prime\prime\prime}$may be shown to be bounded by $3/2+\sqrt{2}$.) As a result

\begin{displaymath}\frac{1}{n}\left\vert \ell^{\prime\prime}(\theta\vert X_1,\ld...
...rime}(0\vert X_1,\ldots,X_n) \right\vert \le 4\vert\theta\vert
\end{displaymath}

Pick $\epsilon > 0$ so that $4\epsilon < \pi/2$. If B is the event

\begin{displaymath}\frac{1}{n} \ell^{\prime\prime}(0\vert X_1,\ldots,X_n)\to -\frac{1}{2}
\end{displaymath}

and $\omega$ is in B then there is an N such that for $n \ge N$we have

\begin{displaymath}\frac{1}{n} \ell^{\prime\prime}(\theta\vert X_1,\ldots,X_n) < 0
\end{displaymath}

for all $\vert\theta\vert < \epsilon$.

This proves that for all $3\epsilon < \pi/8$

\begin{displaymath}P(D^{(1)}(\epsilon))=1 \, .
\end{displaymath}

We have now shown that for $\epsilon < \pi/24$

\begin{displaymath}P(D^{(1)}(\epsilon)\cap D^{(2)}_\epsilon\cap D^{(3)}_\epsilon)=1
\end{displaymath}

For $\omega$ in this event we have that there is an N such that

\begin{displaymath}\vert\hat\theta_n\vert \le \epsilon
\end{displaymath}

for all $n \ge N$. This establishes the result.

General Case

Consider parametric family

\begin{displaymath}\{f(x\vert\theta); a<\theta < b\}
\end{displaymath}

Let $\theta_o$ be true value of $\theta$

Let $A_\epsilon$ be the event: $\exists N$ such that $\forall n \ge N$ $\ell$ has a local maximum on the interval $[\theta_o-\epsilon,\theta_o+\epsilon]$.

We have proved quite generally that

\begin{displaymath}P(A_\epsilon) = 1
\end{displaymath} (2)

Add assumption

\begin{displaymath}\tag{{\bf A}} \ell\text{ has a continuous derivative}
\end{displaymath} (A)

Let $B_\epsilon$ be the event that there is an N such that for all $n \ge N$ there is a critical point of $\ell$ in $(\theta_o-\epsilon,\theta_o+\epsilon)$which is a local maximum of $\ell$ we have proved

\begin{displaymath}P(B_\epsilon) = 1
\end{displaymath} (3)

The event $B = \cap_\epsilon B_\epsilon$ then has probability 1. On this event there is a sequence of roots of the likelihood equations which is consistent.

Remaining problem: prove, under general conditions, that there is probably only one root near $\theta_o$.

Consider event that $\ell^\prime$ is monotone on $[\theta_o-\epsilon,\theta_o+\epsilon]$. Previous proof based on showing that next derivative was negative at $\theta_o$ and did not change much over a small enough interval.

Behaviour at $\theta_o$ is essentially the behaviour of

\begin{displaymath}\frac{1}{n}\sum V_i(\theta_o)
\end{displaymath}

which converges almost surely to

\begin{displaymath}{\rm E}(L_1^{\prime\prime}(\theta_o))
\end{displaymath}

I claim this is negative for regular families.

Begin with

\begin{displaymath}1 = \int f(x,\theta) dx
\end{displaymath}

Differentiating with respect to $\theta$ gives
\begin{align*}0 & =\frac{d}{d\theta} \int f(x,\theta) dx
\\
& = \lim_{\epsilon\to 0} \int \frac{f(x,\theta+\epsilon)-f(x,\theta)}{\epsilon}
dx
\end{align*}

In order to take the limit inside the integral sign we must prove that for any sequence $\epsilon_n\to 0$
\begin{multline*}\lim \int
\frac{f(x,\theta+\epsilon_n)-f(x,\theta)}{\epsilon_n}
dx
\\
= \int \frac{\partial}{\partial\theta}f(x,\theta) dx
\end{multline*}
Normally: apply dominated convergence theorem. If f is continuously differentiable wrt $\theta$ then difference quotient is exactly

\begin{displaymath}\frac{\partial}{\partial\theta}f(x,\theta^*_n)
\end{displaymath}

where $\theta^*_n$ depends on both n and x.

One tactic: compute

\begin{displaymath}M(x,\epsilon) =\sup\{\vert\frac{\partial}{\partial\theta}f(x,\theta)\vert;
\vert\theta-\theta_o\vert \le \epsilon\}
\end{displaymath}

and show that

\begin{displaymath}\int M(x,\epsilon) dx < \infty
\end{displaymath}

to apply dominated convergence.

Assuming dominated convergence theorem applies:
\begin{align*}0 & = \int \frac{\partial}{\partial\theta}f(x,\theta) dx
\\
& = \...
...heta)}{\partial\theta}f(x,\theta) dx
\\
& = {\rm E}(U_k(\theta_o))
\end{align*}
Differentiating again and again passing limits through integrals gives

\begin{displaymath}{\rm E}(U_k^2(\theta_o)) = -{\rm E}(V_k(\theta_o))
\end{displaymath}

This shows that

\begin{displaymath}\frac{1}{n} \ell^{\prime\prime}(\theta_o)\to {\rm E}(V_k(\theta_o)) < 0
\end{displaymath}

almost surely.

Next we consider

\begin{displaymath}\frac{1}{n} \ell^{\prime\prime}(\theta) -
\ell^{\prime\prime}(\theta_o)
\end{displaymath}

For a three times continuously differentiable $\ell$ there is a $\theta_n^*$ (which is random but between $\theta_o$ and $\theta$) such that this difference is

\begin{displaymath}\frac{\theta-\theta_o}{n} \sum W_i(\theta_n^*)
\end{displaymath}

Define

\begin{displaymath}M_i(\epsilon) = \sup\{\vert W_i(\theta)\vert: \vert\theta-\theta_o\vert\le \epsilon\}
\end{displaymath}

The Mi are iid. If for some $\epsilon > 0$ the Mi are integrable then SLLN shows

\begin{displaymath}\limsup \left\vert\frac{1}{n} \left\{\ell^{\prime\prime}(\the...
...heta_o)\right\}\right\vert \le \epsilon {\rm E}(M_1(\epsilon))
\end{displaymath}

almost surely. RHS of inequality can be made arbitrarily small by choosing $\epsilon$ small enough.

Pick $\epsilon$ so small that bound strictly smaller than

\begin{displaymath}I(\theta_o) \equiv - {\rm E}(V_k(\theta_o)) \, .
\end{displaymath}

Then

\begin{displaymath}P( E_\epsilon) = 1
\end{displaymath}

where $E_\epsilon $ is event $\exists N$ st $\forall n \ge N$ we have $\ell^\prime$ is monotone decreasing on $[\theta_o-\epsilon,\theta_o+\epsilon]$.


next up previous



Richard Lockhart
2000-10-03