Given data with model
:
Definition: The likelihood function is map : domain
, values given by
Key Point: think about how the density depends on not
about how it depends on
.
Notice: , observed value of the
data, has been plugged into the formula for density.
We use likelihood for most inference problems:
Maximum Likelihood Estimation
To find MLE maximize .
Typical function maximization problem:
Set gradient of equal to 0
Check root is maximum, not minimum or saddle point.
Often is product of
terms (given
independent observations).
Much easier to work with logarithm
of : log of product is sum and logarithm is monotone
increasing.
Definition: The Log Likelihood function is
Simplest problem: collect replicate measurements
from single population.
Model: are iid
.
Parameters ():
.
Parameter space:
and
is some positive definite
matrix.
Log likelihood is
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.
Summary: is maximized at
More difficult: differentiate wrt
.
Somewhat simpler: set
First derivative wrt is matrix with entries
Need: derivative of two functions:
Fact:
th entry of
is
Fact:
; expansion
by minors.
Conclusion
Set = 0 and find only critical point is
Usual sample covariance matrix is
Properties of MLEs:
1)
2)
.
Distribution of ? Joint distribution of
and
?
Theorem:
Suppose
are independent
random
variables.
Then
Proof: Let
.
Then
are
independent
.
So
is multivariate
standard normal.
Note that
and
Thus
So: reduced to and
.
Step 1: Define
![]() |
![]() |
|
![]() |
Put
. Since
Thus
is independent of
.
Since is a function of
we see that
and
are independent.
Also, see
.
First 2 parts done.
Consider
.
Note that
.
Now: distribution of quadratic forms:
Suppose
and
is symmetric.
Put
for
diagonal,
orthogonal.
Then
So:
has same distribution as
Special case: if all are either 0 or 1 then
has a chi-squared distribution with df
= number of
equal to 1.
When are eigenvalues all 1 or 0?
Answer: if and only if is idempotent.
1) If idempotent and
is an eigenpair
the
2) Conversely if all eigenvalues of are 0 or 1 then
has 1s and 0s on diagonal so
Since
it has the law
So eigenvalues are those of
and
is
iff
is idempotent and
.
Our case:
.
Check
.
How many degrees of freedom:
.
Defn: The trace of a square matrix is
Property:
.
So:
![]() |
![]() |
|
![]() |
Conclusion: df for
is
Derivation of the
density:
Suppose
independent
. Define
distribution to be that of
.
Define angles
by
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
|
![]() |
![]() |
Matrix of partial derivatives is
FACT: multiplying a column in a matrix by multiplies
the determinant by
.
SO: Jacobian of transformation is
Thus joint density
of
is
Answer has the form
Evaluate by making
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
Fourth part: consequence of
first 3 parts and def'n of
distribution.
Defn:
if
has same distribution
as
Derive density of in this definition:
![]() |
![]() |
|
![]() |
Theorem:
Suppose
are independent
random
variables.
Then
Proof: Let
where
and
are
independent
.
So
.
Note that
and
![]() |
![]() |
|
![]() |
Consequences. In 1, 2 and 4: can assume
and
. In 3 can take
.
Step 1: Do general
. Define
Compute variance covariance matrix
![]() |
![]() |
|
![]() |
Defn: If is
and
is
then
is the
matrix with the pattern
Conclusions so far:
1)
and
are independent.
2)
Next: Wishart law.
Defn: The
distribution is
the distribution of
Properties of Wishart.
1) If
then
2) if
independent
then
Proof of part 3: rewrite
Uses further props of Wishart distribution.
3: If
and
then
4: If
and
then
5: If
then
6: If
is partitioned
into components then