Given data
with model
:
Definition: The likelihood function is map
: domain
, values given by
Key Point: think about how the density depends on
not
about how it depends on
.
Notice:
, observed value of the
data, has been plugged into the formula for density.
We use likelihood for most inference problems:
which lies in
over
if such a
of
. We use
where
in
where
. We base our
decision
on the likelihood ratio
Maximum Likelihood Estimation
To find MLE maximize
.
Typical function maximization problem:
Set gradient of
equal to 0
Check root is maximum, not minimum or saddle point.
Often
is product of
terms (given
independent observations).
Much easier to work with logarithm
of
: log of product is sum and logarithm is monotone
increasing.
Definition: The Log Likelihood function is
Simplest problem: collect replicate measurements
from single population.
Model:
are iid
.
Parameters (
):
.
Parameter space:
and
is some positive definite
matrix.
Log likelihood is
![]() |
![]() |
|
![]() |
![]() |
![]() |
|
![]() |
.
Second derivative wrt
Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.
Summary:
is maximized at
More difficult: differentiate
wrt
.
Somewhat simpler: set
First derivative wrt
is matrix with entries
Need: derivative of two functions:
Fact:
th entry of
is
Fact:
; expansion
by minors.
Conclusion
Set = 0 and find only critical point is
Usual sample covariance matrix is
Properties of MLEs:
1)
2)
.
Distribution of
? Joint distribution of
and
?
Theorem:
Suppose
are independent
random
variables.
Then
.
.
.
Proof: Let
.
Then
are
independent
.
So
is multivariate
standard normal.
Note that
and
Thus
.
So: reduced to
and
.
Step 1: Define
.) Now
so we need
to compute
![]() |
![]() |
|
![]() |
Put
. Since
are independent and each is normal.
Thus
is independent of
.
Since
is a function of
we see that
and
are independent.
Also, see
.
First 2 parts done.
Consider
.
Note that
.
Now: distribution of quadratic forms:
Suppose
and
is symmetric.
Put
for
diagonal,
orthogonal.
Then
is standard multivariate normal.
So:
has same distribution as
are eigenvalues of
Special case: if all
are either 0 or 1 then
has a chi-squared distribution with df
= number of
equal to 1.
When are eigenvalues all 1 or 0?
Answer: if and only if
is idempotent.
1) If
idempotent and
is an eigenpair
the
2) Conversely if all eigenvalues of
are 0 or 1 then
has 1s and 0s on diagonal so
. Then
Since
it has the law
So eigenvalues are those of
and
is
iff
is idempotent and
.
Our case:
.
Check
.
How many degrees of freedom:
.
Defn: The trace of a square matrix
is
Property:
.
So:
![]() |
![]() |
|
![]() |
Conclusion: df for
is
Derivation of the
density:
Suppose
independent
. Define
distribution to be that of
.
Define angles
by
![]() |
||
![]() |
||
![]() |
||
![]() |
![]() |
|
![]() |
.
Matrix of partial derivatives is
while
every other entry has a factor
FACT: multiplying a column in a matrix by
multiplies
the determinant by
.
SO: Jacobian of transformation is
Thus joint density
of
is
dimensional
multiple integral
.
Answer has the form
Evaluate
by making
![]() |
![]() |
|
![]() |
,
to see that
![]() |
![]() |
|
![]() |
Fourth part: consequence of
first 3 parts and def'n of
distribution.
Defn:
if
has same distribution
as
,
and
independent.
Derive density of
in this definition:
![]() |
![]() |
|
![]() |
, to get
Theorem:
Suppose
are independent
random
variables.
Then
.
.
is Hotelling's
has an
distribution.
Proof: Let
where
and
are
independent
.
So
.
Note that
and
![]() |
![]() |
|
![]() |
Consequences. In 1, 2 and 4: can assume
and
. In 3 can take
.
Step 1: Do general
. Define
.) Clearly Compute variance covariance matrix
has a pattern. It is a
patterned matrix with entry ![]() |
![]() |
|
![]() |
Defn: If
is
and
is
then
is the
matrix with the pattern
Conclusions so far:
1)
and
are independent.
2)
Next: Wishart law.
Defn: The
distribution is
the distribution of
are iid
.
Properties of Wishart.
1) If
then
2) if
independent
then
Proof of part 3: rewrite
.
Put
as
cols in matrix
. Then
check that
for
orthogonal
unit vectors
.
Define
.
Then check that
Uses further props of Wishart distribution.
3: If
and
then
4: If
and
then
5: If
then
6: If
is partitioned
into components then