No Title

STAT 450 Lecture 16

Reading for Today's Lecture:

Goals of Today's Lecture:

Do MLE examples.
Discuss parametriztion invariance of MLE.
Introduce large sample theory for MLE.

Today's notes

Maximum Likelihood Estimation

To find an MLE we maximize L. This is a typical function maximization problem which we approach by setting the gradient of L equal to 0 and then checking to see that the root is a maximum, not a minimum or saddle point.

The Binomial Distribution

If X has a Binomial $(n,\theta)$ distribution then
$\begin{align*}L(\theta) & = \left( \begin{array}{c} n \\ X \end{array}\right) ... ...1-\theta) \\ U(\theta) & = \frac{X}{\theta} - \frac{n-X}{1-\theta} \end{align*}$
The function L is 0 at $\theta=0$ and at $\theta=1$ unless X=0or X=n so for $1 \le X \le n$ the MLE must be found by setting U=0 and getting

$\begin{displaymath}\hat\theta = \frac{X}{n} \end{displaymath}$

For X=n the log-likelihood has derivative

$\begin{displaymath}U(\theta) = \frac{n}{\theta} > 0 \end{displaymath}$

for all $\theta$ so that the likelihood is an increasing function of $\theta$ which is maximized at $\hat\theta=1=X/n$ . Similarly when X=0 the maximum is at $\hat\theta=0=X/n$ .

If $X_1,\ldots,X_n$ are independent then the log likelihood is of the form

$\begin{displaymath}\ell(\theta )= \sum \log(f(X_i,\theta)) \end{displaymath}$

The score function is

$\begin{displaymath}U(\theta) = \sum \frac{\partial \log f}{\partial\theta}(X_i,\theta) \end{displaymath}$

The mle $\hat\theta$ maximizes $\ell$ . If the maximum occurs at the interior of the parameter space and the log likelihood continuously differentiable then $\hat\theta$ solves the likelihood equations

$\begin{displaymath}U(\theta) = 0 \end{displaymath}$

Examples

N $(\mu,\sigma^2$ )

There is a unique root of the likelihood equations. It is a global maximum.

[Remark: Suppose we had called $\tau=\sigma^2$ the parameter. The score function would still have two components with the first component being the same as before but now the second component is

$\begin{displaymath}\frac{\partial}{\partial\tau} \ell = \frac{\sum(X_i-\mu)^2}{2\tau^2} -\frac{n}{2\tau} \end{displaymath}$

Setting the new likelihood equations equal to 0 still gives

$\begin{displaymath}\hat\tau = \hat\sigma^2 \end{displaymath}$

This is a general invariance (or equivariance) principal. If $\phi=g(\theta)$ is some reparametrization of a model (a one to one relabelling of the parameter values) then $\hat\phi= g(\hat\theta)$ . We will see that this does not apply to other estimators.]

Cauchy: location $\theta$

There is at least 1 root of the likelihood equations but often several more. One of the roots is a global maximum, others, if they exist may be local minima or maxima.

Binomial( $n,\theta$ )

If X=0 or X=n there is no root of the likelihood equations; in this case the likelihood is monotone. For other values of X there is a unique root, a global maximum. The global maximum is at $\hat\theta = X/n$ even if X=0 or n.

The 2 parameter exponential

The density is

$\begin{displaymath}f(x;\alpha,\beta) = \frac{1}{\beta} e^{-(x-\alpha)/\beta} 1(x>\alpha) \end{displaymath}$

The resulting log-likelihood is $-\infty$ for $\alpha > \min\{X_1,\ldots,X_mn\}$ and otherwise is

$\begin{displaymath}\ell(\alpha,\beta) = -n\log(\beta) -\sum(X_i-\alpha)/\beta \end{displaymath}$

As a function of $\alpha$ this is increasing till $\alpha$ reaches

$\begin{displaymath}\hat\alpha = X_{(1)} = \min\{X_1,\ldots,X_mn\} \end{displaymath}$

which gives the mle of $\alpha$ . Now plug in this value $\hat\alpha$ for $\alpha$ and get the so-called profile likelihood for $\beta$ :

$\begin{displaymath}\ell_{\mbox{profile}}(\beta) = -n\log(\beta) -\sum(X_i-X_{(1)})/\beta \end{displaymath}$

Take the $\beta$ derivative and set it equal to 0 to get

$\begin{displaymath}\hat\beta =\sum(X_i-X_{(1)})/n \end{displaymath}$

Notice that the mle $\hat\theta=(\hat\alpha,\hat\beta)$ does not solve the likelihood equations; we had to look at the edge of the possible parameter space. The parameter $\alpha$ is called a support or truncation parameter. ML methods behave oddly in problems with such parameters.

Three parameter Weibull

The density in question is

$\begin{displaymath}f(x;\alpha,\beta,\gamma) = \frac{1}{\beta} \left(\frac{x-\a... ...ght)^{\gamma-1} \exp[-\{(x-\alpha)/\beta\}^\gamma]1(x>\alpha) \end{displaymath}$

There are 3 derivatives to take to solve the likelihood equations. Setting the $\beta$ derivative equal to 0 gives the equation

$\begin{displaymath}\hat\beta(\alpha,\gamma) = \left[\sum (X_i-\alpha)^\gamma/n\right]^{1/\gamma} \end{displaymath}$

where we use the notation $\hat\beta(\alpha,\gamma)$ to indicate that the mle of $\beta$ could be found by finding the mles of the other two parameters and then plugging in to the formula above. It is not possible to find explicitly the remaining two parameters; numerical methods are needed. However, you can see that putting $\gamma < 1$ and letting $\alpha \to X_{(1)}$ will make the log likelihood go to $\infty$ . The mle is not uniquely defined, then, since any $\gamma < 1$ and any $\beta$ will do.

If the true value of $\gamma$ is more than 1 then the probability that there is a root of the likelihood equations is high; in this case there must be two more roots: a local maximum and a saddle point! For a true value of $\gamma>0$ the theory we detail below applies to the local maximum and not to the global maximum of the likelihood equations.

Large Sample Theory

In the next few lectures we will be working toward explaining and ``proving'' the following theorem:

Theorem: Under ``suitable regularity conditions''

1.: The MLE is consistent.
2.: The MLE is asymptotically normal.

The meaning of the first assertion is that in a precise mathematical sense

$\begin{displaymath}\hat\theta \to \theta_0 \end{displaymath}$

as the sample size n goes to infinity. In this course we will simply provide some heuristics which help indicate why this theorem ought to be true.

The second assertion means that

$\begin{displaymath}n^{1/2}(\hat\theta-\theta_0) \Rightarrow N(0,\sigma^2) \end{displaymath}$

for a certain $\sigma^2$ ; we will later show how to compute $\sigma$ , how to estimate $\sigma$ and how to use this to get confidence intervals and hypothesis tests.

$next$ $up$ $previous$

Richard Lockhart
1999-10-18