web

Likelihood Methods of Inference

Given data with model $\{f_\theta(x);\theta\in\Theta\}$ :

Definition: The likelihood function is map : domain $\Theta$ , values given by

$\displaystyle L(\theta) = f_\theta(X)$

Key Point: think about how the density depends on $\theta$ not about how it depends on .

Notice: , observed value of the data, has been plugged into the formula for density.

We use likelihood for most inference problems:

Point estimation: we must compute an estimate $\hat\theta = \hat\theta(X)$ which lies in $\Theta$ . The maximum likelihood estimate (MLE) of $\theta$ is the value $\hat\theta$ which maximizes $L(\theta)$ over $\theta\in \Theta$ if such a $\hat\theta$ exists.
Point estimation of a function of $\theta$ : we must compute an estimate $\hat\phi = \hat\phi(X)$ of $\phi=g(\theta)$ . We use $\hat\phi=g(\hat\theta)$ where $\hat\theta$ is the MLE of $\theta$ .
Interval (or set) estimation. We must compute a set in $\Theta$ which we think will contain $\theta_0$ . We will use

$\displaystyle \{\theta\in\Theta: L(\theta) > c\}$
for a suitable .
Hypothesis testing: decide whether or not $\theta_0\in\Theta_0$ where $\Theta_0 \subset \Theta$ . We base our decision on the likelihood ratio

$\displaystyle \frac{\sup\{L(\theta); \theta \in \Theta_0\}}{ \sup\{L(\theta); \theta \in \Theta\setminus\Theta_0\}}$

Maximum Likelihood Estimation

To find MLE maximize .

Typical function maximization problem:

Set gradient of equal to 0

Check root is maximum, not minimum or saddle point.

Often is product of terms (given independent observations).

Much easier to work with logarithm of : log of product is sum and logarithm is monotone increasing.

Definition: The Log Likelihood function is

$\displaystyle \ell(\theta) = \log\{L(\theta)\} \,.$

Samples from MVN Population

Simplest problem: collect replicate measurements ${\bf X}_1,\ldots,{\bf X}_n$ from single population.

Model: are iid $MVN_p(\boldsymbol\mu, \boldsymbol\Sigma)$ .

Parameters ( $\theta$ ): $(\boldsymbol\mu, \boldsymbol\Sigma)$ . Parameter space: $\boldsymbol\mu\in \mathbb {R}^p$ and $\boldsymbol\Sigma$ is some positive definite $p\times p$ matrix.

Log likelihood is

$\displaystyle \ell(\boldsymbol\mu, \boldsymbol\Sigma) = -n$	$\displaystyle p\log(\pi)/2 - n\log\det\boldsymbol\Sigma / 2$
	$\displaystyle - \sum ({\bf X}_i -\boldsymbol\mu)^T\boldsymbol\Sigma^{-1}({\bf X}_i -\boldsymbol\mu)/2$

Take derivatives.

$\displaystyle \frac{\partial\ell}{\partial\boldsymbol\mu}$	$\displaystyle = \boldsymbol\Sigma^{-1}\left\{\sum({\bf X}_i-\boldsymbol\mu)\right\}$
	$\displaystyle = n\boldsymbol\Sigma^{-1}(\bar{{\bf X}}-\boldsymbol\mu)$

where $\bar{{\bf X}}= \sum {\bf X}_i/n$ . Second derivative wrt $\boldsymbol\mu$ is a matrix:

$\displaystyle -n\boldsymbol\Sigma^{-1}$

Fact: if second derivative matrix is negative definite at critical point then critical point is a maximum.

Fact: if second derivative matrix is negative definite everywhere then function is concave; no more than 1 critical point.

Summary: $\ell$ is maximized at

$\displaystyle \hat{\boldsymbol\mu} = \bar{{\bf X}}$

(regardless of choice of $\boldsymbol\Sigma$ ).

More difficult: differentiate $\ell$ wrt $\boldsymbol\Sigma$ .

Somewhat simpler: set ${\bf D}=\boldsymbol\Sigma^{-1}$

First derivative wrt ${\bf D}$ is matrix with entries

$\displaystyle \frac{\partial\ell}{\partial{\bf D}_{ij}}$

Warning: method used ignores symmetry of $\boldsymbol\Sigma$ .

Need: derivative of two functions:

$\displaystyle \frac{\partial\log\det{\bf A}}{\partial{\bf A}} = {\bf A}^{-1}$

and

$\displaystyle \frac{\partial{\bf x}^T{\bf A}{\bf x}}{\partial{\bf A}} = {\bf x}{\bf x}^T$

Fact: th entry of ${\bf A}^{-1}$ is

$\displaystyle (-1)^{i+j}\frac{\det({\bf A}^{(ij)})}{\det{{\bf A}}}$

where ${\bf A}^{(ij)}$ denotes matrix obtained from ${\bf A}$ by removing column

and row

Fact: $\det({\bf A})= \sum_k (-1)^{i+k} A_{ik} \det({\bf A}^{(ik)})$ ; expansion by minors.

Conclusion

$\displaystyle \frac{\partial\log\det{\bf A}}{\partial A_{ij}} = ({\bf A}^{-1})_{ij}$

and

$\displaystyle \frac{\partial\log\det{\bf A}^{-1}}{\partial A_{ij}} = -({\bf A}^{-1})_{ij}$

Implication

$\displaystyle \frac{\partial\ell}{\partial{\bf D}} = -n\boldsymbol\Sigma/2 -\sum_i({\bf X}_i - \boldsymbol\mu)({\bf X}_i - \boldsymbol\mu)^T/2$

Set = 0 and find only critical point is

$\displaystyle \hat{\boldsymbol\Sigma} = \sum_i({\bf X}_i - \bar{{\bf X}})({\bf X}_i - \bar{{\bf X}})^T/n$

Usual sample covariance matrix is

$\displaystyle {\bf S} = \sum_i({\bf X}_i - \bar{{\bf X}})({\bf X}_i - \bar{{\bf X}})^T/(n-1)$

Properties of MLEs:

1) $\bar{{\bf X}}\sim MVN_p(\boldsymbol\mu,n^{-1}\boldsymbol\Sigma)$

2) ${\rm E}({\bf S}) = \boldsymbol\Sigma$ .

Distribution of ${\bf S}$ ? Joint distribution of $\bar{{\bf X}}$ and ${\bf S}$ ?

Univariate Normal samples: Distribution Theory

Theorem: Suppose $X_1,\ldots,X_n$ are independent $N(\mu,\sigma^2)$ random variables. Then

$\bar X$ (sample mean)and (sample variance) independent.
$n^{1/2}(\bar{X} - \mu)/\sigma \sim N(0,1)$ .
$(n-1)s^2/\sigma^2 \sim \chi^2_{n-1}$ .
$n^{1/2}(\bar{X} - \mu)/s \sim t_{n-1}$ .

Proof: Let $Z_i=(X_i-\mu)/\sigma$ .

Then $Z_1,\ldots,Z_p$ are independent $ N(0,1)$ .

So $Z=(Z_1,\ldots,Z_p)^T$ is multivariate standard normal.

Note that $\bar{X} = \sigma\bar{Z}+\mu$ and $s^2 = \sum(X_i-\bar{X})^2/(n-1) = \sigma^2 \sum(Z_i-\bar{Z})^2/(n-1)$ Thus

$\displaystyle \frac{n^{1/2}(\bar{X}-\mu)}{\sigma} = n^{1/2}\bar{Z}$

$\displaystyle \frac{(n-1)s^2}{\sigma^2} = \sum(Z_i-\bar{Z})^2$

and

$\displaystyle T=\frac{n^{1/2}(\bar{X} - \mu)}{s} = \frac{n^{1/2} \bar{Z}}{s_Z}$

where $(n-1)s_Z^2 = \sum(Z_i-\bar{Z})^2$ .

So: reduced to $\mu=0$ and $\sigma=1$ .

Step 1: Define

$\displaystyle Y=(\sqrt{n}\bar{Z}, Z_1-\bar{Z},\ldots,Z_{n}-\bar{Z})^T \,.$

(So

has dimension $ n+1$

.) Now

$\displaystyle Y =\left[\begin{array}{cccc} \frac{1}{\sqrt{n}} & \frac{1}{\sqrt{... ...] \left[\begin{array}{c} Z_1 \\ Z_2 \\ \vdots \\ Z_n \end{array}\right]$

or letting ${\bf M}$ denote the matrix

$\displaystyle Y={\bf M}Z \,.$

It follows that $Y\sim MVN(0,{\bf M}{\bf M}^T)$ so we need to compute ${\bf M}{\bf M}^T$ :

$\displaystyle {\bf M}{\bf M}^T$	$\displaystyle = \left[\begin{array}{c\vert cccc} 1 & 0 & 0 & \cdots & 0 \\ \hli... ...ots & -\frac{1}{n} \\ 0 & \vdots & \cdots & & 1-\frac{1}{n} \end{array} \right]$
	$\displaystyle = \left[\begin{array}{c\vert c} 1 & 0 \\ \hline \\ 0 & {\bf Q} \end{array} \right] \,.$

Put ${\bf Y}_2=(Y_2,\ldots,Y_{n+1})$ . Since

$\displaystyle {\rm Cov}(Y_1,{\bf Y}_2) = 0$

conclude

and ${\bf Y}_2$ are independent and each is normal.

Thus $\sqrt{n}\bar{Z}$ is independent of $Z_1-\bar{Z},\ldots,Z_{n}-\bar{Z}$ .

Since is a function of $Z_1-\bar{Z},\ldots,Z_{n}-\bar{Z}$ we see that $\sqrt{n}\bar{Z}$ and are independent.

Also, see $\sqrt{n}\bar{Z}\sim N(0,1)$ .

First 2 parts done.

Consider $(n-1)s^2/\sigma^2 = {\bf Y}_2^T{\bf Y}_2$ . Note that ${\bf Y}_2 \sim MVN(0,{\bf Q})$ .

Now: distribution of quadratic forms:

Suppose $Z\sim MVN(0,{\bf I})$ and ${\bf A}$ is symmetric. Put ${\bf A}= {\bf P}{\bf D}{\bf P}^T$ for ${\bf D}$ diagonal, ${\bf P}$ orthogonal.

Then

$\displaystyle {\bf Z}^T{\bf A}{\bf Z}= ({\bf Z}^*)^T {\bf D}{\bf Z}^*$

where

$\displaystyle {\bf Z}^* = {\bf P}^T{\bf Z}$

But ${\bf Z}^*\sim MVN(0,{\bf P}^T{\bf P}={\bf I})$ is standard multivariate normal.

So: ${\bf Z}^T{\bf A}{\bf Z}$ has same distribution as

$\displaystyle \sum_i \lambda_i Z_i^2$

where $\lambda_1,\ldots,\lambda_n$ are eigenvalues of ${\bf A}$ .

Special case: if all $\lambda_i$ are either 0 or 1 then ${\bf Z}^T{\bf A}{\bf Z}$ has a chi-squared distribution with df = number of $\lambda_i$ equal to 1.

When are eigenvalues all 1 or 0?

Answer: if and only if ${\bf A}$ is idempotent.

1) If ${\bf A}$ idempotent and $\lambda, x$ is an eigenpair the

$\displaystyle {\bf A}x = \lambda x$

and

$\displaystyle {\bf A}x = {\bf A}{\bf A}x = \lambda {\bf A}x = \lambda^2 x$

$\displaystyle (\lambda-\lambda^2) x =0$

proving $\lambda$ is 0 or 1.

2) Conversely if all eigenvalues of ${\bf A}$ are 0 or 1 then ${\bf D}$ has 1s and 0s on diagonal so

$\displaystyle {\bf D}^2 = {\bf D}$

and

$\displaystyle {\bf A}{\bf A}= {\bf P}{\bf D}{\bf P}^T {\bf P}{\bf D}{\bf P}^T = {\bf P}{\bf D}^2 {\bf P}^t = {\bf P}{\bf D}{\bf P}= {\bf A}$

Next case: ${\bf X}\sim MVN_p(0,\boldsymbol\Sigma)$ . Then ${\bf X}={\bf A}{\bf Z}$ with ${\bf A}{\bf A}^T=\boldsymbol\Sigma$ .

Since ${\bf X}^T{\bf X}= {\bf Z}^T {\bf A}^T{\bf A}{\bf Z}$ it has the law

$\displaystyle \sum \lambda_i Z_i^2$

$\lambda_i$ are eigenvalues of ${\bf A}^T{\bf A}$ . But

$\displaystyle {\bf A}^T{\bf A}x = \lambda x$

implies

$\displaystyle {\bf A}{\bf A}^T{\bf A}x= \boldsymbol\Sigma {\bf A}x = \lambda{\bf A}x$

So eigenvalues are those of $\boldsymbol\Sigma$ and ${\bf X}^T{\bf X}$ is $\chi^2_\nu$ iff $\boldsymbol\Sigma$ is idempotent and ${\rm trace}(\boldsymbol\Sigma)=\nu$ .

Our case: ${\bf A}={\bf Q}= {\bf I}- {\bf 1}{\bf 1}^T/n$ . Check ${\bf Q}^2 = {\bf Q}$ . How many degrees of freedom: ${\rm trace}({\bf D})$ .

Defn: The trace of a square matrix ${\bf A}$ is

$\displaystyle {\rm trace}({\bf A}) = \sum {\bf A}_{ii}$

Property: ${\rm trace}({\bf A}{\bf B})={\rm trace}({\bf B}{\bf A})$ .

So:

$\displaystyle {\rm trace}({\bf A})$	$\displaystyle ={\rm trace}({\bf P}{\bf D}{\bf P}^T)$
	$\displaystyle ={\rm trace}({\bf D}{\bf P}^T{\bf P}) = {\rm trace}({\bf D})$

Conclusion: df for $(n-1)s^2/\sigma^2$ is

$\displaystyle {\rm trace}( {\bf I}- {\bf 1}{\bf 1}^T/n) = n-1.$

Derivation of the $\chi^2$ density:

Suppose $Z_1,\ldots,Z_n$ independent $ N(0,1)$ . Define $\chi^2_n$ distribution to be that of $U=Z_1^2 + \cdots + Z_n^2$ . Define angles $\theta_1,\ldots,\theta_{n-1}$ by

$\displaystyle Z_1$	$\displaystyle = U^{1/2} \cos\theta_1$
$\displaystyle Z_2$	$\displaystyle = U^{1/2} \sin\theta_1\cos\theta_2$
$\displaystyle \vdots$	$\displaystyle = \vdots$
$\displaystyle Z_{n-1}$	$\displaystyle = U^{1/2} \sin\theta_1\cdots \sin\theta_{n-2}\cos\theta_{n-1}$
$\displaystyle Z_n$	$\displaystyle = U^{1/2} \sin\theta_1\cdots \sin\theta_{n-1} \,.$

(Spherical co-ordinates in

dimensions. The $\theta$ values run from 0 to $\pi$ except last $\theta$ from 0 to $2\pi$ .) Derivative formulas:

$\displaystyle \frac{\partial Z_i}{\partial U} = \frac{1}{2U} Z_i$

and

$\displaystyle \frac{\partial Z_i}{\partial\theta_j} = \left\{ \begin{array}{ll}... ...\ -Z_i\tan\theta_i & j=i \\ Z_i\cot\theta_j & j < i \,. \end{array}\right.$

Fix

to clarify the formulas. Use shorthand $R=\sqrt{U}$ .

Matrix of partial derivatives is

$\displaystyle \left[\begin{array}{ccc} \frac{\cos\theta_1}{2R} & -R \sin\theta_... ...R \cos\theta_1\sin\theta_2 & R \sin\theta_1\cos\theta_2 \end{array}\right] \,.$

Find determinant:

$\displaystyle U^{1/2}\sin(\theta_1)/2$

(non-negative for all

and $\theta_1$ ). General

: every term in the first column contains a factor $U^{-1/2}/2$ while every other entry has a factor $U^{1/2}$ .

FACT: multiplying a column in a matrix by multiplies the determinant by .

SO: Jacobian of transformation is

$\displaystyle u^{(n-2)/2}u^{-1/2}/2 \times h(\theta_1,\theta_{n-1})$

for some function,

, which depends only on the angles.

Thus joint density of $U,\theta_1,\ldots \theta_{n-1}$ is

$\displaystyle (2\pi)^{-n/2} \exp(-u/2) u^{(n-2)/2}h(\theta_1, \cdots, \theta_{n-1}) / 2 \,.$

To compute the density of

we must do an $ n-1$

dimensional multiple integral $d\theta_{n-1}\cdots d\theta_1$ .

Answer has the form

$\displaystyle cu^{(n-2)/2} \exp(-u/2)$

for some

Evaluate by making

$\displaystyle \int f_U(u) du$	$\displaystyle = c \int_0^\infty u^{(n-2)/2} \exp(-u/2) du$
	$\displaystyle =1.$

Substitute $ y=u/2$

to see that

$\displaystyle c 2^{n/2} \int_0^\infty y^{(n-2)/2}e^{-y} dy$	$\displaystyle = c 2^{n/2} \Gamma(n/2)$
	$\displaystyle =1.$

CONCLUSION: the $\chi^2_n$ density is

$\displaystyle \frac{1}{2\Gamma(n/2)} \left(\frac{u}{2}\right)^{(n-2)/2} e^{-u/2} 1(u>0) \,.$

Fourth part: consequence of first 3 parts and def'n of $t_\nu$ distribution.

Defn: $T\sim t_\nu$ if has same distribution as

$\displaystyle Z/\sqrt{U/\nu}$

for $Z\sim N(0,1)$ , $U\sim\chi^2_\nu$ and $ Z,U$

independent.

Derive density of in this definition:

$\displaystyle P(T \le t)$	$\displaystyle = P( Z \le t\sqrt{U/\nu})$
	$\displaystyle = \int_0^\infty \int_{-\infty}^{t\sqrt{u/\nu}} f_Z(z)f_U(u) dz du$

Differentiate wrt

by differentiating inner integral:

$\displaystyle \frac{\partial}{\partial t}\int_{at}^{bt} f(x)dx = bf(bt)-af(at)$

by fundamental thm of calculus. Hence

$\displaystyle \frac{d}{dt} P(T \le t) = \int_0^\infty \frac{f_U(u)}{ \sqrt{2\pi}} \left(\frac{u}{\nu}\right)^{1/2} \exp\left(-\frac{t^2u}{2\nu}\right) du \,.$

Plug in

$\displaystyle f_U(u)= \frac{1}{2\Gamma(\nu/2)}(u/2)^{(\nu-2)/2} e^{-u/2}$

to get

$\displaystyle f_T(t) = \frac{\int_0^\infty (u/2)^{(\nu-1)/2} e^{-u(1+t^2/\nu)/2} du }{2\sqrt{\pi\nu}\Gamma(\nu/2)} \,.$

Substitute $y=u(1+t^2/\nu)/2$ , to get

$\displaystyle dy=(1+t^2/\nu)du/2$

$\displaystyle (u/2)^{(\nu-1)/2}= [y/(1+t^2/\nu)]^{(\nu-1)/2}$

leading to

$\displaystyle f_T(t) = \frac{(1+t^2/\nu)^{-(\nu+1)/2} }{\sqrt{\pi\nu}\Gamma(\nu/2)} \int_0^\infty y^{(\nu-1)/2} e^{-y} dy$

$\displaystyle f_T(t)= \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)}\frac{1}{(1+t^2/\nu)^{(\nu+1)/2}} \,.$

Multivariate Normal samples: Distribution Theory

Theorem: Suppose ${\bf X}_1,\ldots,{\bf X}_n$ are independent $N(\boldsymbol\mu,\boldsymbol\Sigma)$ random variables. Then

$\bar {\bf X}$ (sample mean)and ${\bf S}$ (sample variance-covariance matrix) are independent.
$n^{1/2}(\bar{{\bf X}} - \boldsymbol\mu)\sim MVN(0,{\bf I})$ .
$(n-1){\bf S}\sim {\rm Wishart}_p(n-1,\boldsymbol\Sigma)$ .
$T^2=n(\bar{{\bf X}} - \boldsymbol\mu)^T{\bf S}^{-1}(\bar{{\bf X}} - \boldsymbol\mu)$ is Hotelling's . has an $F_{p,n-p}$ distribution.

Proof: Let ${\bf X}_i={\bf A}{\bf Z}_i+\boldsymbol\mu$ where ${\bf A}{\bf A}^T=\boldsymbol\Sigma$ and ${\bf Z}_1,\ldots,{\bf Z}_p$ are independent $MVN(0,{\bf I})$ .

So ${\bf Z}=({\bf Z}_1^T,\ldots,{\bf Z}_p^T)^T \sim MVN_p(0,{\bf I})$ .

Note that $\bar{{\bf X}} = {\bf A}\bar{{\bf Z}}+\boldsymbol\mu$ and

$\displaystyle (n-1){\bf S}$	$\displaystyle = \sum({\bf X}_i-\bar{{\bf X}})({\bf X}_i-\bar{{\bf X}})^T$
	$\displaystyle = {\bf A}\sum({\bf Z}_i-\bar{{\bf Z}})({\bf Z}_i-\bar{{\bf Z}})^T{\bf A}^T$

Thus

$\displaystyle n^{1/2}(\bar{X}-\mu) = {\bf A}n^{1/2}\bar{{\bf Z}}$

and

$\displaystyle T^2 =\left(n^{1/2}\bar{{\bf Z}}\right)^T {\bf S}_{\bf Z}^{-1}\left(n^{1/2}\bar{{\bf Z}}\right)$

where

$\displaystyle {\bf S}_Z= \sum({\bf Z}_i-\bar{{\bf Z}})({\bf Z}_i-\bar{{\bf Z}})^T/(n-1).$

Consequences. In 1, 2 and 4: can assume $\boldsymbol\mu=0$ and $\boldsymbol\Sigma={\bf I}$ . In 3 can take $\boldsymbol\mu=0$ . Step 1: Do general $\boldsymbol\Sigma$ . Define

$\displaystyle {\bf Y}=(\sqrt{n}\bar{{\bf Z}}^T, {\bf Z}_1^T-\bar{{\bf Z}}^T,\ldots, {\bf Z}_{n}^T-\bar{{\bf Z}}^T)^T \,.$

(So ${\bf Y}$ has dimension $ p( n+1)$

.) Clearly ${\bf Y}$ is

with mean 0.

Compute variance covariance matrix

$\displaystyle \left[\begin{array}{cc} {\bf I}_{p\times p} & 0 \\ 0 & {\bf Q}^* \end{array}\right]$

where ${\bf Q}^*$ has a pattern. It is a $p\times p$ patterned matrix with entry

being

$\displaystyle {\rm Cov}({\bf Z}_i- \bar{{\bf Z}},{\bf Z}_j- \bar{{\bf Z}})$	$\displaystyle = \begin{cases}-\boldsymbol\Sigma/n & i \neq j \\ (n-1)\boldsymbol\Sigma/n & i=j \end{cases}$
	$\displaystyle = {\bf Q}_{ij}\boldsymbol\Sigma$

Kronecker Products

Defn: If ${\bf A}$ is $p \times q$ and ${\bf B}$ is $r \times s$ then ${\bf A}\bigotimes{\bf B}$ is the $pr \times qs$ matrix with the pattern

$\displaystyle \left[\begin{array}{cccc} {\bf A}_{11}{\bf B}& {\bf A}_{12}{\bf B... ...{\bf B}& {\bf A}_{p2} {\bf B}& \cdots & {\bf A}_{pq}{\bf B} \end{array}\right]$

So our variance covariance matrix is

$\displaystyle {\bf Q}^* = {\bf Q}\bigotimes \boldsymbol\Sigma$

Conclusions so far:

1) $\bar{{\bf X}}$ and ${\bf S}$ are independent.

2) $\sqrt{n}(\bar{{\bf X}} - \boldsymbol\mu ) \sim MVN(0,\boldsymbol\Sigma)$

Next: Wishart law.

Defn: The ${\rm Wishart}_p(n,\boldsymbol\Sigma)$ distribution is the distribution of

$\displaystyle \sum_1^n {\bf Z}_i{\bf Z}_i^T$

where ${\bf Z}_1,\ldots,{\bf Z}_n$ are iid $MVN_p(0,\boldsymbol\Sigma)$ .

Properties of Wishart.

1) If ${\bf A}{\bf A}^t=\boldsymbol\Sigma$ then

$\displaystyle {\rm Wishart}_p(0,\boldsymbol\Sigma) = {\bf A}{\rm Wishart}_p(0,{\bf I}){\bf A}^T$

2) if ${\bf W}_i, i=1,2$ independent ${\rm Wishart}_p(n_i,\boldsymbol\Sigma)$ then

$\displaystyle {\bf W}_1+{\bf W}_2 \sim {\rm Wishart}_p(n_1+n_2,\boldsymbol\Sigma).$

Proof of part 3: rewrite

$\displaystyle \sum ({\bf Z}_i-\bar{{\bf Z}})({\bf Z}_i-\bar{{\bf Z}})^T$

in form

$\displaystyle \sum_{j=1}^{n-1} {\bf U}_i{\bf U}_i^T$

for ${\bf U}_i$ iid $MVN_p(0,\boldsymbol\Sigma)$ . Put ${\bf Z}_1,\ldots,{\bf Z}_n$ as cols in matrix ${\bf Z}$ which is $p\times n$ . Then check that

$\displaystyle {\bf Z}{\bf Q}{\bf Z}^T = \sum ({\bf Z}_i-\bar{{\bf Z}})({\bf Z}_i-\bar{{\bf Z}})^T$

Write ${\bf Q}= \sum {\bf v}_i {\bf v}_i^T$ for $ n-1$

orthogonal unit vectors ${\bf v}_1,\ldots,{\bf v}_{n-1}$ . Define

$\displaystyle {\bf U}_i = {\bf Z}{\bf v}_i$

and compute covariances to check that the ${\bf U}_i$ are iid $MVN_p(0,\boldsymbol\Sigma)$ . Then check that

$\displaystyle {\bf Z}{\bf Q}{\bf Z}^T = \sum{\bf U}_i{\bf U}_i^T$

Proof of 4: suffices to have $\boldsymbol\Sigma={\bf I}$ .

Uses further props of Wishart distribution.

3: If ${\bf W}\sim Wishart_p(n,\boldsymbol\Sigma)$ and ${\bf a}\in \mathbb {R}$ then

$\displaystyle \frac{{\bf a}^T {\bf W}{\bf a}}{{\bf a}^T\boldsymbol\Sigma {\bf a}} \sim \chi_n^2$

4: If ${\bf W}\sim Wishart_p(n,\boldsymbol\Sigma)$ and $n \ge p$ then

$\displaystyle \frac{ {\bf a}^T\boldsymbol\Sigma^{-1} {\bf a} }{ {\bf a}^T {\bf W}^{-1} {\bf a} } \sim \chi_{n-p+1}^2$

5: If ${\bf W}\sim Wishart_p(n,\boldsymbol\Sigma)$ then

$\displaystyle {\rm trace}(\boldsymbol\Sigma^{-1}{\bf W}) \sim \chi_{np}^2$

6: If ${\bf W}\sim Wishart_{p+q}(n,\boldsymbol\Sigma)$ is partitioned into components then

$\displaystyle {\bf W}_{11} - {\bf W}_{12}{\bf W}_{22}^{-1} {\bf W}_{21} \sim Wishart_{p}(n-q,\boldsymbol\Sigma_{11.2})$

Richard Lockhart
2002-09-30