next up previous


Postscript version of this file

STAT 870 Lecture 6

Consistency of MLE

Suppose $X_1,X_2,\ldots$ are iid with density $f(x,\theta_o)$ where

\begin{displaymath}f(\cdot,\theta);\theta\in\Theta \subset {\Bbb R}
\end{displaymath}

is a family of densities.

Conditions under which MLE of $\theta$ is a.s.  consistent?

Goal: find conditions under which

\begin{displaymath}P(\hat\theta_n \to \theta_o)=1
\end{displaymath}

where $\hat\theta_n$ is the mle.

General technical problems:

Example: Cauchy$(\theta)$ density is

\begin{displaymath}f(x,\theta) = \frac{1}{\pi\left\{1+(x-\theta)^2\right\}}
\end{displaymath}

For a sample $X_1,\ldots,X_n$ the likelihood is

\begin{displaymath}\frac{1}{\pi^n \prod_1^n\{1+(X_i-\theta)^2\}}
\end{displaymath}

We ``define'' $\hat\theta_n$ to be the value of $\theta$which maximizes this function of $\theta$.

This is supposed to define $\hat\theta$ as a function of $X_1,\ldots,X_n$.

Underlying supposition: for each $x_1,\ldots,x_n$ $\exists !$ $\hat\theta(x_1,\ldots,x_n)$ which maximizes the likelihood. If this were so we would have a definition of a function from ${\Bbb R}^n$ to ${\Bbb R}$. Useful tool: log-likelihood:

\begin{displaymath}\ell(\theta\vert x_1,\ldots,x_n) = -n\log(\pi) - \sum_1^n \log(1+(x_i-\theta)^2)
\end{displaymath}

Problems:

1.
Is there, for every $(x_1,\ldots,x_n)$ a $\theta$ which maximizes $\ell$?

2.
If so is the $\theta$ unique?

3.
If so is $\hat\theta_n(x_1,\ldots,x_n)$ a Borel function of $x_1,\ldots,x_n$?

Question 1: For the Cauchy density there is always a maximizer. Fix $(x_1,\ldots,x_n)$. As $\theta\to\pm\infty$ it is easy to check that

\begin{displaymath}\ell(\theta\vert x_1,\ldots,x_n) \to -\infty
\end{displaymath}

There is then a M such that $\vert\theta\vert>M$ implies
\begin{align*}\sup\{\ell(\theta\vert &x_1,\ldots,x_n) ; \vert\theta\vert >M\}
\\...
...e \sup\{\ell(\theta\vert x_1,\ldots,x_n) ; \vert\theta\vert \le M\}
\end{align*}
Now the function

\begin{displaymath}\theta\mapsto\ell(\theta\vert x_1,\ldots,x_n)
\end{displaymath}

is continuous so that it assumes its maximum over [-M,M]. This shows the existence of at least one maximizing $\theta$ for any set of x values.

Question 2: n=2, x1=x=-x2:

\begin{displaymath}\ell(\theta\vert x,-x)
\end{displaymath}

is an even function of x. Derivative

\begin{displaymath}\ell^\prime = \frac{2(\theta-x)}{1+(x-\theta)^2}
+\frac{2(\theta+x)}{1+(x+\theta)^2}
\end{displaymath}

At $\theta=0$ this is 0 so $\theta=0$ is critical point of $\ell$. 2nd derivative may at 0 is

\begin{displaymath}\ell^{\prime\prime}(0) = \frac{4(x^2-1)}{(1+x^2)^2}
\end{displaymath}

If |x| <1 this is negative so that 0 is a local maximum but if |x| > 1 it is a local minimum. In this case, since $\ell$is even there must be two maxima n either side of 0. Note that putting the two terms in $\ell^\prime$ on a common denominator will give a numerator which is a multiple of

\begin{displaymath}\theta(\theta^2-(x^2-1))
\end{displaymath}

Notice there are exactly three roots if x2>1.

Summary: defining $\hat\theta$ to be the maximizer of $\ell$ does not actually define a function.

Alternative strategies:

1: You might pick one of the maximizing $\theta$values in an unequivocal way:

\begin{displaymath}\hat\theta = \inf\{\theta: \ell(\theta) = \sup\ell\}
\end{displaymath}

(The set of such $\theta$ is not empty and bounded so there is such a $\hat\theta$ and that $\hat\theta$ is finite. By continuity of $\ell$

\begin{displaymath}\ell(\hat\theta)=\sup\ell
\end{displaymath}

2: You might try defining $\hat\theta$ to be a suitably chosen critical point of $\ell$.

3: You might try to prove that
\begin{multline*}P({\rm card}(\{\theta: \ell(\theta\vert X_1,\ldots,X_n)
\\ = \sup_\phi\ell(\phi\vert X_1,\ldots,X_n)\})=1)=1
\end{multline*}
In other words it might be true that the set of $\theta$ where $\ell$ achieves its maximum is almost surely a singleton when the xs are actually a data set.

I am going to follow method 2 since this is the one which works most generally.

For $x=(x_1,\ldots,x_n)$ we define the order statistics

\begin{displaymath}x_{(1)} \le \cdots\le x_{(n)}
\end{displaymath}

be the entries in x sorted into non-decreasing order. If n=2m-1 is odd set

\begin{displaymath}g_n(x_1,\ldots,x_n) = x_{(m)}
\end{displaymath}

If n=2m set

\begin{displaymath}g_n(x_1,\ldots,x_n) = (x_{(m)}+x_{(m+1)})/2.
\end{displaymath}

Now define

\begin{displaymath}\tilde\theta_n = g(X_1,\ldots,X_n)
\end{displaymath}

Lemma 1   If $X_1,\ldots,X_n$ are iid from a distribution F with the properties:

1.
F(0)=1/2.

2.
For each $\epsilon>0$

\begin{displaymath}F(-\epsilon) < 1/2 < F(\epsilon)
\end{displaymath}

Then $\tilde\theta_n$ converges almost surely to 0.

Remark: Part of the theorem is that

\begin{displaymath}A\equiv \{\omega: \tilde\theta_n\to 0\}
\end{displaymath}

is an event. Proof:

Now to prove the lemma we begin by formalizing an argument we have used several times.

Lemma 2   Suppose Yn is a sequence of random variables. Then $Y_n \to 0$ almost surely is equivalent to $P(C_\epsilon)=1$ for each $\epsilon>0$ where

\begin{displaymath}C_\epsilon=\bigcup_{N=1}^\infty \bigcap_{n=N}^\infty \{\vert Y_n\vert\le \epsilon\}
\end{displaymath}

Fix $\epsilon>0$. For each x rvs $Y_1,Y_2,\ldots$ defined by $Y_k=1(X_k \le x)-F(x)$ are iid with mean 0. According to SLLN there is a null set Nx such that for all $\omega\not\in N_x$ we have

\begin{displaymath}\frac{1}{n} \sum_1^n Y_k \to 0 \, .
\end{displaymath}

Let $N=N_{\epsilon}\cup N_{-\epsilon}$. Then N is a null set. If $\omega\not\in N$ then

\begin{displaymath}\frac{1}{n}\sum_1^n 1(X_k \le \epsilon) \to F(\epsilon) > 1/2
\end{displaymath}

and

\begin{displaymath}\frac{1}{n}\sum_1^n 1(X_k \le -\epsilon) \to F(-\epsilon) < 1/2
\end{displaymath}

For any such $\omega$ there is an M such that for all $n\ge M$the number of Xi exceeding $\epsilon$ is less than n/2 and the number of Xi less than $-\epsilon$ is less than n/2. Thus for such $\omega$, and $n\ge M$

\begin{displaymath}-\epsilon \le \tilde\theta_n \le \epsilon
\end{displaymath}

In other words the set $C_\epsilon^c \subset N$ so $P(C_\epsilon)=1$. $\bullet$


next up previous



Richard Lockhart
2000-10-03