Given data with model :
Definition: The likelihood function is map : domain , values given by
Key Point: think about how the density depends on not about how it depends on .
Notice: , observed value of the data, has been plugged into the formula for density.
We use likelihood for most inference problems:
Maximum Likelihood Estimation
To find MLE maximize .
Typical function maximization problem:
Set gradient of equal to 0
Check root is maximum, not minimum or saddle point.
Often is product of terms (given independent observations).
Much easier to work with logarithm of : log of product is sum and logarithm is monotone increasing.
Definition: The Log Likelihood function is
Simplest problem: collect replicate measurements from single population.
Model: are iid .
Parameters (): . Parameter space: and is some positive definite matrix.
Log likelihood is
Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.
Summary: is maximized at
More difficult: differentiate wrt .
Somewhat simpler: set
First derivative wrt is matrix with entries
Need: derivative of two functions:
Fact: th entry of is
Fact: ; expansion by minors.
Conclusion
Set = 0 and find only critical point is
Usual sample covariance matrix is
Properties of MLEs:
1)
2) .
Distribution of ? Joint distribution of and ?
Theorem: Suppose are independent random variables. Then
Proof: Let .
Then are independent .
So is multivariate standard normal.
Note that and Thus
So: reduced to and .
Step 1: Define
Put . Since
Thus is independent of .
Since is a function of we see that and are independent.
Also, see .
First 2 parts done.
Consider . Note that .
Now: distribution of quadratic forms:
Suppose and is symmetric. Put for diagonal, orthogonal.
Then
So: has same distribution as
Special case: if all are either 0 or 1 then has a chi-squared distribution with df = number of equal to 1.
When are eigenvalues all 1 or 0?
Answer: if and only if is idempotent.
1) If idempotent and is an eigenpair the
2) Conversely if all eigenvalues of are 0 or 1 then has 1s and 0s on diagonal so
Since it has the law
So eigenvalues are those of and is iff is idempotent and .
Our case: . Check . How many degrees of freedom: .
Defn: The trace of a square matrix is
Property: .
So:
Conclusion: df for is
Derivation of the density:
Suppose independent . Define distribution to be that of . Define angles by
Matrix of partial derivatives is
FACT: multiplying a column in a matrix by multiplies the determinant by .
SO: Jacobian of transformation is
Thus joint density of is
Answer has the form
Evaluate by making
Fourth part: consequence of first 3 parts and def'n of distribution.
Defn: if has same distribution as
Derive density of in this definition:
Theorem: Suppose are independent random variables. Then
Proof: Let where and are independent .
So .
Note that and
Consequences. In 1, 2 and 4: can assume and . In 3 can take . Step 1: Do general . Define
Compute variance covariance matrix
Defn: If is and is then is the matrix with the pattern
Conclusions so far:
1) and are independent.
2)
Next: Wishart law.
Defn: The distribution is the distribution of
Properties of Wishart.
1) If then
2) if independent then
Proof of part 3: rewrite
Uses further props of Wishart distribution.
3: If and then
4: If and then
5: If then
6: If is partitioned into components then