web

The Multivariate Normal Distribution

Defn: $Z \in \mathbb {R}^1 \sim N(0,1)$ iff

$\displaystyle f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2} \,.$

Defn: ${\bf Z}\in \mathbb {R}^p \sim MVN_p(0,I)$ if and only if ${\bf Z}=(Z_1,\ldots,Z_p)^T$ with the independent and each $Z_i\sim N(0,1)$ .

In this case according to our theorem

$\displaystyle f_{\bf Z}(z_1,\ldots,z_p)$	$\displaystyle = \prod \frac{1}{\sqrt{2\pi}} e^{-z_i^2/2}$
	$\displaystyle = (2\pi)^{-p/2} \exp\{ -z^T z/2\} \, ;$

superscript

denotes matrix transpose.

Defn: ${\bf X}\in \mathbb {R}^p$ has a multivariate normal distribution if it has the same distribution as ${\bf A}{\bf Z}+\boldsymbol{\mu}$ for some $\boldsymbol{\mu}\in \mathbb {R}^p$ , some $p\times q$ matrix of constants ${\bf A}$ and $Z\sim MVN_q(0,I)$ .

$ p=q$ , ${\bf A}$ singular: ${\bf X}$ does not have a density.

${\bf A}$ invertible: derive multivariate normal density by change of variables:

$\displaystyle {\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu} \Leftrightarrow {\bf Z}={\bf A}^{-1}({\bf X}-\boldsymbol{\mu})$

$\displaystyle \frac{\partial {\bf X}}{\partial {\bf Z}} = {\bf A}\qquad \frac{\partial {\bf Z}}{\partial {\bf X}} = {\bf A}^{-1} \,.$

$\displaystyle f_{\bf X}(x)$	$\displaystyle = f_{\bf Z}({\bf A}^{-1}(x-\boldsymbol{\mu})) \vert \det({\bf A}^{-1})\vert$
	$\displaystyle = \frac{ \exp\{-(x-\boldsymbol{\mu})^T ({\bf A}^{-1})^T {\bf A}^{-1} (x-\boldsymbol{\mu})/2\} }{(2\pi)^{p/2} \vert\det{{\bf A}}\vert } \,.$

Now define $\boldsymbol{\Sigma}={\bf A}{\bf A}^T$ and notice that

$\displaystyle \boldsymbol{\Sigma}^{-1} = ({\bf A}^T)^{-1} {\bf A}^{-1} = ({\bf A}^{-1})^T {\bf A}^{-1}$

and

$\displaystyle \det \boldsymbol{\Sigma} = \det {\bf A}\det {\bf A}^T = (\det {\bf A})^2 \,.$

Thus $f_{\bf X}$ is

$\displaystyle \frac{ \exp\{ -(x-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1} (x-\boldsymbol{\mu}) /2 \}}{ (2\pi)^{p/2} (\det\boldsymbol{\Sigma})^{1/2} } \, ;$

the $MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ density. Note density is the same for all ${\bf A}$ such that ${\bf A}{\bf A}^T=\boldsymbol{\Sigma}$ . This justifies the notation $MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ .

For which $\boldsymbol{\mu}$ , $\boldsymbol{\Sigma}$ is this a density?

Any $\boldsymbol{\mu}$ but if $x \in \mathbb {R}^p$ then

$\displaystyle x^T \boldsymbol{\Sigma} x$	$\displaystyle = x^T {\bf A}{\bf A}^T x$
	$\displaystyle = ({\bf A}^T x)^T ({\bf A}^T x)$
	$\displaystyle = \sum_1^p y_i^2 \ge 0$

where $y={\bf A}^T x$ . Inequality strict except for $ y=0$

which is equivalent to

. Thus $\boldsymbol{\Sigma}$ is a positive definite symmetric matrix.

Conversely, if $\boldsymbol{\Sigma}$ is a positive definite symmetric matrix then there is a square invertible matrix ${\bf A}$ such that ${\bf A}{\bf A}^T=\boldsymbol{\Sigma}$ so that there is a $MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ distribution. ( ${\bf A}$ can be found via the Cholesky decomposition, e.g.)

When ${\bf A}$ is singular ${\bf X}$ will not have a density: $\exists a$ such that $P(a^T {\bf X}= a^T \boldsymbol{\mu}) =1$ ; ${\bf X}$ is confined to a hyperplane.

Still true: distribution of ${\bf X}$ depends only on $\boldsymbol{\Sigma}={\bf A}{\bf A}^T$ : if ${\bf A}{\bf A}^T = BB^T$ then ${\bf A}{\bf Z}+\boldsymbol{\mu}$ and $B{\bf Z}+\boldsymbol{\mu}$ have the same distribution.

Expectation, moments

Defn: If ${\bf X}\in \mathbb {R}^p$ has density then

$\displaystyle {\rm E}(g({\bf X})) = \int g(x)f(x)\, dx \,.$

any

from $\mathbb {R}^p$ to $\mathbb {R}$ .

FACT: if $ Y=g(X)$ for a smooth (mapping $\mathbb {R}\to \mathbb {R}$ )

$\displaystyle {\rm E}(Y)$	$\displaystyle = \int y f_Y(y) \, dy$
	$\displaystyle = \int g(x) f_Y(g(x)) g^\prime(x) \, dx$
	$\displaystyle = {\rm E}(g(X))$

by change of variables formula for integration. This is good because otherwise we might have two different values for ${\rm E}(e^X)$ .

Linearity: ${\rm E}(aX+bY) = a{\rm E}(X)+b{\rm E}(Y)$ for real and .

Defn: The $r^{\rm th}$ moment (about the origin) of a real rv is $\mu_r^\prime={\rm E}(X^r)$ (provided it exists). We generally use $\mu$ for ${\rm E}(X)$ .

Defn: The $r^{\rm th}$ central moment is

$\displaystyle \mu_r = {\rm E}[(X-\mu)^r]$

We call $\sigma^2 = \mu_2$ the variance.

Defn: For an $\mathbb {R}^p$ valued random vector ${\bf X}$

$\displaystyle \boldsymbol{\mu}_{\bf X}= {\rm E}({\bf X})$

is the vector whose $i^{\rm th}$ entry is ${\rm E}(X_i)$ (provided all entries exist).

Fact: same idea used for random matrices.

Defn: The ( $p \times p$ ) variance covariance matrix of ${\bf X}$ is

$\displaystyle {\rm Var}({\bf X}) = {\rm E}\left[ ({\bf X}-\boldsymbol{\mu})({\bf X}-\boldsymbol{\mu})^T \right]$

which exists provided each component

has a finite second moment.

Example moments: If $Z\sim N(0,1)$ then

$\displaystyle {\rm E}(Z)$	$\displaystyle = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}$
	$\displaystyle = \left.\frac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$
	$\displaystyle = 0$

and (integrating by parts)

$\displaystyle {\rm E}(Z^r) =$	$\displaystyle \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}$
$\displaystyle =$	$\displaystyle \left.\frac{-z^{r-1}e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$
	$\displaystyle + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}$

so that

$\displaystyle \mu_r = (r-1)\mu_{r-2}$

for $r \ge 2$ . Remembering that $\mu_1=0$ and

$\displaystyle \mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1$

we find that

$\displaystyle \mu_r = \left\{ \begin{array}{ll} 0 & \mbox{$r$ odd} \\ (r-1)(r-3)\cdots 1 & \mbox{$r$ even} \,. \end{array}\right.$

If now $X\sim N(\mu,\sigma^2)$ , that is, $X\sim \sigma Z + \mu$ , then ${\rm E}(X) = \sigma {\rm E}(Z) + \mu = \mu$ and

$\displaystyle \mu_r(X) = {\rm E}[(X-\mu)^r] = \sigma^r {\rm E}(Z^r)$

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Similarly for ${\bf X}\sim MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ we have ${\bf X}={\bf A}{\bf Z}+\mu$ with ${\bf Z}\sim MVN(0,I)$ and

$\displaystyle {\rm E}({\bf X}) = \boldsymbol{\mu}$

and

$\displaystyle {\rm Var}({\bf X})$	$\displaystyle = {\rm E}\left\{({\bf X}-\boldsymbol{\mu})({\bf X}-\boldsymbol{\mu})^T\right\}$
	$\displaystyle = {\rm E}\left\{ {\bf A}{\bf Z}({\bf A}{\bf Z})^T\right\}$
	$\displaystyle = {\bf A}{\rm E}({\bf Z}{\bf Z}^T) {\bf A}^T$
	$\displaystyle = {\bf A}I{\bf A}^T = \boldsymbol{\Sigma} \,.$

Note use of easy calculation: ${\rm E}({\bf Z})=0$ and

$\displaystyle {\rm Var}({\bf Z}) = {\rm E}({\bf Z}{\bf Z}^T) =I \,.$

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each is integrable then $X=X_1\cdots X_p$ is integrable and

$\displaystyle {\rm E}(X_1\cdots X_p) = {\rm E}(X_1) \cdots {\rm E}(X_p) \,.$

Moment Generating Functions

Defn: The moment generating function of a real valued is

$\displaystyle M_X(t) = {\rm E}(e^{tX})$

defined for those real

for which the expected value is finite.

Defn: The moment generating function of ${\bf X}\in \mathbb {R}^p$ is

$\displaystyle M_{\bf X}(u) = {\rm E}[e^{u^T{\bf X}}]$

defined for those vectors

for which the expected value is finite.

Example: If $Z\sim N(0,1)$ then

$\displaystyle M_Z(t)$	$\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{tz-z^2/2} dz$
	$\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-(z-t)^2/2+t^2/2} dz$
	$\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-u^2/2+t^2/2} du$
	$\displaystyle = e^{t^2/2}$

Theorem: ( $ p=1$ ) If is finite for all in a neighbourhood of ${\bf0}$ then

Every moment of is finite.
is $C^\infty$ (in fact is analytic).
$\mu_k^\prime = \frac{d^k}{dt^k} M_X(0)$ .

Note: $C^\infty$ means has continuous derivatives of all orders. Analytic means has convergent power series expansion in neighbourhood of each $t\in(-\epsilon,\epsilon)$ .

The proof, and many other facts about mgfs, rely on techniques of complex variables.

Characterization & MGFs

Theorem: Suppose ${\bf X}$ and ${\bf Y}$ are $\mathbb {R}^p$ valued random vectors such that

$\displaystyle M_{\bf X}({\bf u}) = M_{\bf Y}({\bf u})$

for ${\bf u}$ in some open neighbourhood of ${\bf0}$ in $\mathbb {R}^p$ . Then ${\bf X}$ and ${\bf Y}$ have the same distribution.

The proof relies on techniques of complex variables.

MGFs and Sums

If $X_1,\ldots,X_p$ are independent and $Y=\sum X_i$ then mgf of is product mgfs of individual :

$\displaystyle {\rm E}(e^{tY}) = \prod_i {\rm E}(e^{tX_i})$

or $M_Y = \prod M_{X_i}$ . (Also for multivariate

Example: If $Z_1,\ldots,Z_p$ are independent $ N(0,1)$ then

$\displaystyle {\rm E}(e^{\sum a_i Z_i})$	$\displaystyle = \prod_i {\rm E}(e^{a_i Z_i})$
	$\displaystyle = \prod_i e^{a_i^2/2}$
	$\displaystyle = \exp(\sum a_i^2/2)$

Conclusion: If ${\bf Z}\sim MNV_p(0,I)$ then

$\displaystyle M_Z(u) = \exp(\sum u_i^2/2) = \exp({\bf u}^T {\bf u}/2).$

Example: If $X\sim N(\mu,\sigma^2)$ then $X=\sigma Z + \mu$ and

$\displaystyle M_X(t) = {\rm E}(e^{t(\sigma Z+\mu)}) = e^{t\mu} e^{\sigma^2t^2/2}.$

Theorem: Suppose ${\bf X}= {\bf A}{\bf Z}+{\boldsymbol{\mu}}$ and ${\bf Y}= {\bf A}^* {\bf Z}^* + {\boldsymbol{\mu^*}}$ where ${\bf Z}\sim MVN_p(0,I)$ and ${\bf Z}^* \sim MVN_q(0,I)$ . Then ${\bf X}$ and ${\bf Y}$ have the same distribution if and only iff the following two conditions hold:

$\boldsymbol{\mu} = \boldsymbol{\mu}^*$ .
${\bf A}{\bf A}^T = {\bf A}^*({\bf A}^*)^T$ .

Alternatively: if ${\bf X}$ , ${\bf Y}$ each MVN then ${\rm E}({\bf X})={\rm E}({\bf Y})$ and ${\rm Var}({\bf X}) = {\rm Var}({\bf Y})$ imply that ${\bf X}$ and ${\bf Y}$ have the same distribution.

Proof: If 1 and 2 hold the mgf of ${\bf X}$ is

$\displaystyle {\rm E}\left(e^{t^T{\bf X}}\right)$	$\displaystyle = {\rm E}\left(e^{t^T({\bf A}{\bf Z}+\boldsymbol\mu}\right)$
	$\displaystyle = e^{t^T\boldsymbol\mu}{\rm E}\left(e^{({\bf A}^Tt)^T {\bf Z}}\right)$
	$\displaystyle = e^{t^T\boldsymbol\mu+({\bf A}^Tt)^T({\bf A}^Tt)}$
	$\displaystyle = e^{t^T\boldsymbol\mu+t^T\boldsymbol\Sigma t}$

Thus $M_{\bf X}=M_{\bf Y}$ . Conversely if ${\bf X}$ and ${\bf Y}$ have the same distribution then they have the same mean and variance.

Thus mgf is determined by $\boldsymbol\mu$ and $\boldsymbol\Sigma$ .

Theorem: If ${\bf X}\sim MVN_p(\mu,\boldsymbol{\Sigma})$ then there is ${\bf A}$ a $p \times p$ matrix such that ${\bf X}$ has same distribution as ${\bf A}{\bf Z}+\boldsymbol{\mu}$ for ${\bf Z}\sim MVN_p(0,I)$ .

We may assume that ${\bf A}$ is symmetric and non-negative definite, or that ${\bf A}$ is upper triangular, or that is lower triangular.

Proof: Pick any ${\bf A}$ such that ${\bf A}{\bf A}^T=\boldsymbol{\Sigma}$ such as ${\bf P}{\bf D}^{1/2} {\bf P}^T$ from the spectral decomposition. Then ${\bf A}{\bf Z}+\boldsymbol{\mu}\sim MVN_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$ .

From the symmetric square root can produce an upper triangular square root by the Gram Schmidt process: if ${\bf A}$ has rows $a_1^T,\ldots,a_p^T$ then let be $a_p/\sqrt{a_p^T a_p}$ . Choose $v_{p-1}$ proportional to $a_{p-1} -bv_p$ where $b = a_{p-1}^T v_p$ so that $v_{p-1}$ has unit length. Continue in this way; you automatically get $ a_j^T v_k = 0$ if $ j < k$ . If ${\bf P}$ has columns $v_1,\ldots,v_p$ then ${\bf P}$ is orthogonal and ${\bf A}{\bf P}$ is an upper triangular square root of $\boldsymbol\Sigma$ .

Variances, Covariances, Correlations

Defn: The covariance between ${\bf X}$ and ${\bf Y}$ is

$\displaystyle {\rm Cov}({\bf X},{\bf Y}) = {\rm E}\left\{ ({\bf X}-\boldsymbol\mu_{\bf X})({\bf Y}-\boldsymbol\mu_{\bf Y})^T\right\}$

This is a matrix.

Properties:

${\rm Cov}({\bf X},{\bf X})={\rm Var}({\bf X})$ .
Cov is bilinear:

$\displaystyle {\rm Cov}({\bf A}{\bf X}+{\bf B}{\bf W},{\bf Y}) =$ $\displaystyle {\bf A}{\rm Cov}({\bf X},{\bf Y})$

$\displaystyle + {\bf B}{\rm Cov}({\bf W},{\bf Y})$

and

$\displaystyle {\rm Cov}({\bf X},{\bf C}{\bf Y}+{\bf D}{\bf Z}) =$ $\displaystyle {\rm Cov}({\bf X},{\bf Y}){\bf C}^T$

$\displaystyle + {\rm Cov}({\bf X},{\bf Z}){\bf D}^T$

Properties of the distribution

1: All margins are multivariate normal: if

$\displaystyle {\bf X}= \left[\begin{array}{c} {\bf X}_1\\ {\bf X}_2\end{array} \right]$

$\displaystyle \mu = \left[\begin{array}{c} \mu_1\\ \mu_2\end{array} \right]$

and

$\displaystyle \boldsymbol{\Sigma} = \left[\begin{array}{cc} \boldsymbol{\Sigma}... ...} \\ \boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{array} \right]$

then ${\bf X}\sim MVN(\boldsymbol{\mu},\boldsymbol{\Sigma}) \Rightarrow {\bf X}_1\sim MVN(\boldsymbol{\mu}_1,\boldsymbol{\Sigma}_{11})$ .

2: ${\bf M}{\bf X}+\boldsymbol{\nu} \sim MVN({\bf M}\boldsymbol{\mu}+\boldsymbol{\nu}, {\bf M} \boldsymbol{\Sigma} {\bf M}^T)$ : affine transformation of MVN is normal.

3: If

$\displaystyle \boldsymbol{\Sigma}_{12} = {\rm Cov}({\bf X}_1,{\bf X}_2) = {\bf0}$

then ${\bf X}_1$ and ${\bf X}_2$ are independent.

4: All conditionals are normal: the conditional distribution of ${\bf X}_1$ given ${\bf X}_2=x_2$ is $MVN(\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1} (... ...-\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21})$ Proof of ( 1): If ${\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu}$ then

$\displaystyle {\bf X}_1 = \left[ I \vert {\bf0}\right] {\bf X}$

for

the identity matrix of correct dimension.

$\displaystyle {\bf X}_1 = \left( \left[ I \vert {\bf0}\right]{\bf A}\right) {\bf Z}+ \left[ I \vert {\bf0}\right] \boldsymbol{\mu}$

Compute mean and variance to check rest.

Proof of ( 2): If ${\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu}$ then

$\displaystyle {\bf M}{\bf X}+\boldsymbol{\nu}= {\bf M}{\bf A}{\bf Z} + \boldsymbol{\nu} +{\bf M}\boldsymbol{\mu}$

Proof of ( 3): If

$\displaystyle {\bf u} = \left[\begin{array}{c}{\bf u}_1 \\ {\bf u}_2\end{array}\right]$

then

$\displaystyle M_{\bf X}(u) = M_{{\bf X}_1}({\bf u}_1)M_{{\bf X}_2}({\bf u}_2)$

Proof of ( 4): first case: assume $\boldsymbol{\Sigma}_{22}$ has an inverse.

Define

$\displaystyle {\bf W}= {\bf X}_1 - \boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1}{\bf X}_2$

Then

$\displaystyle \left[\begin{array}{c} {\bf W}\\ {\bf X}_2\end{array}\right] = \... ...array}\right] \left[\begin{array}{c} {\bf X}_1 \\ {\bf X}_2\end{array}\right]$

Thus $({\bf W},{\bf X}_2)^T$ is $MVN(\boldsymbol\mu_1-\boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1} \boldsymbol\mu_2,\boldsymbol\Sigma^*)$ where

$\displaystyle \boldsymbol\Sigma^* = \left[\begin{array}{cc} \boldsymbol\Sigma_{... ...mbol\Sigma_{21} & {\bf0} \\ {\bf0}& \boldsymbol\Sigma_{22}\end{array}\right]$

Now joint density of ${\bf W}$ and ${\bf X}$ factors

$\displaystyle f_{{\bf W},{\bf X}_2}(w,x_2) = f_{{\bf W}}(w)f_{{\bf X}_2}(x_2)$

By change of variables joint density of ${\bf X}$ is

$\displaystyle f_{{\bf X}_1,{\bf X}_2}(x_1,x_2) = cf_{{\bf W}}(x_1-{\bf M}x_2)f_{{\bf X}_2}(x_2)$

where

is the constant Jacobian of the linear transformation from $({\bf W},{\bf X}_2)$ to $({\bf X}_1,{\bf X}_2)$ and

$\displaystyle {\bf M}= \boldsymbol\Sigma_{12} \boldsymbol\Sigma_{22}^{-1}$

Thus conditional density of ${\bf X}_1$ given ${\bf X}_2=x_2$ is

$\displaystyle \frac{f_{{\bf W}}(x_1-{\bf M}x_2)f_{{\bf X}_2}(x_2)}{f_{{\bf X}_2}(x_2)} = f_{{\bf W}}(x_1-{\bf M}x_2)$

As a function of

this density has the form of the advertised multivariate normal density.

Specialization to bivariate case:

Write

$\displaystyle \boldsymbol\Sigma = \left[\begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2 \\ \rho\sigma_1\sigma_2 & \sigma_2^2\end{array}\right]$

where we define

$\displaystyle \rho = \frac{{\rm Cov}(X_1,X_2)}{\sqrt{{\rm Var}(X_1){\rm Var}(X_2)}}$

Note that

$\displaystyle \sigma_i^2 = {\rm Var}(X_i)$

Then

$\displaystyle W= X_1 - \rho\frac{\sigma_1}{\sigma_2} X_2$

is independent of $ X_2$

. The marginal distribution of

is $N(\mu_1-\rho\sigma_1 \mu_2/\sigma_2,\tau^2)$ where

$\displaystyle \tau^2 =$	$\displaystyle {\rm Var}(X_1) - 2\rho\frac{\sigma_1}{\sigma_2}{\rm Cov}(X_1,X_2)$
	$\displaystyle + \left(\rho\frac{\sigma_1}{\sigma_2}\right)^2 {\rm Var}(X_2)$

This simplifies to

$\displaystyle \sigma_1^2(1-\rho^2)$

Notice that it follows that

$\displaystyle -1 \le \rho \le 1$

More generally: any and :

0	$\displaystyle \le {\rm Var}(X-\lambda Y)$
	$\displaystyle = {\rm Var}(X) - 2 \lambda{\rm Cov}(X,Y) + \lambda^2 {\rm Var}(Y)$

RHS is minimized at

$\displaystyle \lambda = \frac{{\rm Cov}(X,Y)}{{\rm Var}(Y)}$

Minimum value is

$\displaystyle 0 \le {\rm Var}(X) (1 - \rho_{XY}^2)$

where

$\displaystyle \rho_{XY} =\frac{{\rm Cov}(X,Y)}{\sqrt{{\rm Var}(X){\rm Var}(Y)}}$

defines the correlation between

and

Multiple Correlation

Now suppose ${\bf X}_2$ is scalar but ${\bf X}_1$ is vector.

Defn: Multiple correlation between ${\bf X}_1$ and $ X_2$

$\displaystyle R^2({\bf X}_1,X_2) = \max\vert\rho_{{\bf a}^T{\bf X}_1,X_2}\vert^2$

over all ${\bf a}\neq 0$ .

Thus: maximize

$\displaystyle \frac{{\rm Cov}^2({\bf a}^T {\bf X}_1,X_2)}{{\rm Var} ({\bf a}^T{... ...\left({\bf a}^T \boldsymbol\Sigma_{11} {\bf a}\right) \boldsymbol\Sigma_{22}}$

Put $b=\boldsymbol\Sigma_{11}^{1/2}{\bf a}$ . For $\boldsymbol\Sigma_{11}$ invertible problem is equivalent to maximizing

$\displaystyle \frac{{\bf b}^T {\bf Q}{\bf b}}{{\bf b}^T {\bf b}}$

where

$\displaystyle {\bf Q}= \boldsymbol\Sigma_{11}^{-1/2}\boldsymbol\Sigma_{12}\boldsymbol\Sigma_{21}\boldsymbol\Sigma_{11}^{-1/2}$

Solution: find largest eigenvalue of ${\bf Q}$ .

Note

$\displaystyle {\bf Q}= {\bf v}{\bf v}^T$

where

$\displaystyle {\bf v}= \boldsymbol\Sigma_{11}^{-1/2} \boldsymbol\Sigma_{12}$

is a vector. Set

$\displaystyle {\bf v}{\bf v}^T {\bf x} = \lambda {\bf x}$

and multiply by ${\bf v}^T$ to get

$\displaystyle {\bf v}^T{\bf x} = 0$ or $\displaystyle \lambda = {\bf v}^T {\bf v}$

If ${\bf v}^T{\bf x}=0$ then we see $\lambda=0$ so largest eigenvalue is ${\bf v}^T{\bf v}$ .

Summary: maximum squared correlation is

$\displaystyle R^2({\bf X}_1,X_2) = \frac{ {\bf v}^T {\bf v}}{\boldsymbol\Sigma_... ...1}\boldsymbol\Sigma_{11}^{-1} \boldsymbol\Sigma_{12} }{\boldsymbol\Sigma_{22}}$

Achieved when eigenvector is ${\bf x}={\bf v}={\bf b}$ so

$\displaystyle {\bf a}=\boldsymbol\Sigma_{11}^{-1/2}\boldsymbol\Sigma_{11}^{-1/2} \boldsymbol\Sigma_{12}=\boldsymbol\Sigma_{11}^{-1}\boldsymbol\Sigma_{12}$

Notice: since is squared correlation between two scalars ( ${\bf a}^t {\bf X}_1$ and $ X_2$ ) we have

$\displaystyle 0 \le R^2 \le 1$

Equals 1 iff $ X_2$

is linear combination of ${\bf X}_1$ .

Correlation matrices, partial correlations:

Correlation between two scalars and is

$\displaystyle \rho_{XY} =\frac{{\rm Cov}(X,Y)}{\sqrt{{\rm Var}(X){\rm Var}(Y)}}$

If ${\bf X}$ has variance $\boldsymbol\Sigma$ then the correlation matrix of ${\bf X}$ is ${\bf R}_{\bf X}$ with entries

$\displaystyle R_{ij} = \frac{{\rm Cov}(X_i,X_j)}{\sqrt{{\rm Var}(X_i){\rm Var}(X_j)}} = \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii}\Sigma_{jj}}}$

If ${\bf X}_1,{\bf X}_2$ are MVN with the usual partitioned variance covariance matrix then the conditional variance of ${\bf X}_1$ given ${\bf X}_2$ is

$\displaystyle \boldsymbol\Sigma_{11\cdot 2} = \boldsymbol\Sigma_{11} - \boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1}\boldsymbol\Sigma_{21}$

From this define partial correlation matrix

$\displaystyle {\bf R}_{11\cdot 2} = \frac{(\boldsymbol\Sigma_{11\cdot 2})_{ij} }{\sqrt{\boldsymbol\Sigma_{11\cdot 2})_{ii}\boldsymbol\Sigma_{11\cdot 2})_{jj}}}$

Note: these are used even when ${\bf X}_1,{\bf X}_2$ are NOT MVN

Richard Lockhart
2002-09-24

$\displaystyle {\rm Cov}({\bf A}{\bf X}+{\bf B}{\bf W},{\bf Y}) =$	$\displaystyle {\bf A}{\rm Cov}({\bf X},{\bf Y})$
	$\displaystyle + {\bf B}{\rm Cov}({\bf W},{\bf Y})$

$\displaystyle {\rm Cov}({\bf X},{\bf C}{\bf Y}+{\bf D}{\bf Z}) =$	$\displaystyle {\rm Cov}({\bf X},{\bf Y}){\bf C}^T$
	$\displaystyle + {\rm Cov}({\bf X},{\bf Z}){\bf D}^T$