Likelihood Methods of Inference

Given data with model :

Definition: The likelihood function is map : domain , values given by

Key Point: think about how the density depends on not about how it depends on .

Notice: , observed value of the data, has been plugged into the formula for density.

We use likelihood for most inference problems:

1. Point estimation: we must compute an estimate which lies in . The maximum likelihood estimate (MLE) of is the value which maximizes over if such a exists.

2. Point estimation of a function of : we must compute an estimate of . We use where is the MLE of .

3. Interval (or set) estimation. We must compute a set in which we think will contain . We will use

for a suitable .

4. Hypothesis testing: decide whether or not where . We base our decision on the likelihood ratio

Maximum Likelihood Estimation

To find MLE maximize .

Typical function maximization problem:

Set gradient of equal to 0

Check root is maximum, not minimum or saddle point.

Often is product of terms (given independent observations).

Much easier to work with logarithm of : log of product is sum and logarithm is monotone increasing.

Definition: The Log Likelihood function is

Samples from MVN Population

Simplest problem: collect replicate measurements from single population.

Model: are iid .

Parameters (): . Parameter space: and is some positive definite matrix.

Log likelihood is

Take derivatives.

where . Second derivative wrt is a matrix:

Fact: if second derivative matrix is negative definite at critical point then critical point is a maximum.

Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.

Summary: is maximized at

(regardless of choice of ).

More difficult: differentiate wrt .

Somewhat simpler: set

First derivative wrt is matrix with entries

Warning: method used ignores symmetry of .

Need: derivative of two functions:

and

Fact: th entry of is

where denotes matrix obtained from by removing column and row .

Fact: ; expansion by minors.

Conclusion

and

Implication

Set = 0 and find only critical point is

Usual sample covariance matrix is

Properties of MLEs:

1)

2) .

Distribution of ? Joint distribution of and ?

Univariate Normal samples: Distribution Theory

Theorem: Suppose are independent random variables. Then

1. (sample mean)and (sample variance) independent.

2. .

3. .

4. .

Proof: Let .

Then are independent .

So is multivariate standard normal.

Note that and Thus

and

where .

So: reduced to and .

Step 1: Define

(So has dimension .) Now

or letting denote the matrix

It follows that so we need to compute :

Put . Since

conclude and are independent and each is normal.

Thus is independent of .

Since is a function of we see that and are independent.

Also, see .

First 2 parts done.

Consider . Note that .

Suppose and is symmetric. Put for diagonal, orthogonal.

Then

where

But is standard multivariate normal.

So: has same distribution as

where are eigenvalues of .

Special case: if all are either 0 or 1 then has a chi-squared distribution with df = number of equal to 1.

When are eigenvalues all 1 or 0?

Answer: if and only if is idempotent.

1) If idempotent and is an eigenpair the

and

so

proving is 0 or 1.

2) Conversely if all eigenvalues of are 0 or 1 then has 1s and 0s on diagonal so

and

Next case: . Then with .

Since it has the law

are eigenvalues of . But

implies

So eigenvalues are those of and is iff is idempotent and .

Our case: . Check . How many degrees of freedom: .

Defn: The trace of a square matrix is

Property: .

So:

Conclusion: df for is

Derivation of the density:

Suppose independent . Define distribution to be that of . Define angles by

(Spherical co-ordinates in dimensions. The values run from 0 to except last from 0 to .) Derivative formulas:

and

Fix to clarify the formulas. Use shorthand .

Matrix of partial derivatives is

Find determinant:

(non-negative for all and ). General : every term in the first column contains a factor while every other entry has a factor .

FACT: multiplying a column in a matrix by multiplies the determinant by .

SO: Jacobian of transformation is

for some function, , which depends only on the angles.

Thus joint density of is

To compute the density of we must do an dimensional multiple integral .

for some .

Evaluate by making

Substitute , to see that

CONCLUSION: the density is

Fourth part: consequence of first 3 parts and def'n of distribution.

Defn: if has same distribution as

for , and independent.

Derive density of in this definition:

Differentiate wrt by differentiating inner integral:

by fundamental thm of calculus. Hence

Plug in

to get

Substitute , to get

or

Multivariate Normal samples: Distribution Theory

Theorem: Suppose are independent random variables. Then

1. (sample mean)and (sample variance-covariance matrix) are independent.

2. .

3. .

4. is Hotelling's . has an distribution.

Proof: Let where and are independent .

So .

Note that and

Thus

and

where

Consequences. In 1, 2 and 4: can assume and . In 3 can take . Step 1: Do general . Define

(So has dimension .) Clearly is with mean 0.

Compute variance covariance matrix

where has a pattern. It is a patterned matrix with entry being

Kronecker Products

Defn: If is and is then is the matrix with the pattern

So our variance covariance matrix is

Conclusions so far:

1) and are independent.

2)

Next: Wishart law.

Defn: The distribution is the distribution of

where are iid .

Properties of Wishart.

1) If then

2) if independent then

Proof of part 3: rewrite

in form

for iid . Put as cols in matrix which is . Then check that

Write for orthogonal unit vectors . Define

and compute covariances to check that the are iid . Then check that

Proof of 4: suffices to have .

Uses further props of Wishart distribution.

3: If and then

4: If and then

5: If then

6: If is partitioned into components then

Richard Lockhart
2002-09-30