The Multivariate Normal Distribution

Defn: $ Z \in \mathbb {R}^1 \sim N(0,1)$ iff

$\displaystyle f_Z(z) = \frac{1}{\sqrt{2\pi}} e^{-z^2/2} \,.

Defn: $ {\bf Z}\in \mathbb {R}^p \sim MVN_p(0,I)$ if and only if $ {\bf Z}=(Z_1,\ldots,Z_p)^T$ with the $ Z_i$ independent and each $ Z_i\sim N(0,1)$.

In this case according to our theorem

$\displaystyle f_{\bf Z}(z_1,\ldots,z_p)$ $\displaystyle = \prod \frac{1}{\sqrt{2\pi}} e^{-z_i^2/2}$    
  $\displaystyle = (2\pi)^{-p/2} \exp\{ -z^T z/2\} \, ;$    

superscript $ t$ denotes matrix transpose.

Defn: $ {\bf X}\in \mathbb {R}^p$ has a multivariate normal distribution if it has the same distribution as $ {\bf A}{\bf Z}+\boldsymbol{\mu}$ for some $ \boldsymbol{\mu}\in \mathbb {R}^p$, some $ p\times q$ matrix of constants $ {\bf A}$ and $ Z\sim MVN_q(0,I)$.

$ p=q$, $ {\bf A}$ singular: $ {\bf X}$ does not have a density.

$ {\bf A}$ invertible: derive multivariate normal density by change of variables:

$\displaystyle {\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu} \Leftrightarrow {\bf Z}={\bf A}^{-1}({\bf X}-\boldsymbol{\mu})

$\displaystyle \frac{\partial {\bf X}}{\partial {\bf Z}} = {\bf A}\qquad \frac{\partial {\bf Z}}{\partial {\bf X}} =
{\bf A}^{-1} \,.


$\displaystyle f_{\bf X}(x)$ $\displaystyle = f_{\bf Z}({\bf A}^{-1}(x-\boldsymbol{\mu})) \vert \det({\bf A}^{-1})\vert$    
  $\displaystyle = \frac{ \exp\{-(x-\boldsymbol{\mu})^T ({\bf A}^{-1})^T {\bf A}^{-1} (x-\boldsymbol{\mu})/2\} }{(2\pi)^{p/2} \vert\det{{\bf A}}\vert } \,.$    

Now define $ \boldsymbol{\Sigma}={\bf A}{\bf A}^T$ and notice that

$\displaystyle \boldsymbol{\Sigma}^{-1} = ({\bf A}^T)^{-1} {\bf A}^{-1} = ({\bf A}^{-1})^T {\bf A}^{-1}


$\displaystyle \det \boldsymbol{\Sigma} = \det {\bf A}\det {\bf A}^T = (\det {\bf A})^2 \,.

Thus $ f_{\bf X}$ is

$\displaystyle \frac{
\exp\{ -(x-\boldsymbol{\mu})^T \boldsymbol{\Sigma}^{-1}
(x-\boldsymbol{\mu}) /2 \}}{
(2\pi)^{p/2} (\det\boldsymbol{\Sigma})^{1/2} }
\, ;

the $ MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ density. Note density is the same for all $ {\bf A}$ such that $ {\bf A}{\bf A}^T=\boldsymbol{\Sigma}$. This justifies the notation $ MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$.

For which $ \boldsymbol{\mu}$, $ \boldsymbol{\Sigma}$ is this a density?

Any $ \boldsymbol{\mu}$ but if $ x \in \mathbb {R}^p$ then

$\displaystyle x^T \boldsymbol{\Sigma} x$ $\displaystyle = x^T {\bf A}{\bf A}^T x$    
  $\displaystyle = ({\bf A}^T x)^T ({\bf A}^T x)$    
  $\displaystyle = \sum_1^p y_i^2 \ge 0$    

where $ y={\bf A}^T x$. Inequality strict except for $ y=0$ which is equivalent to $ x=0$. Thus $ \boldsymbol{\Sigma}$ is a positive definite symmetric matrix.

Conversely, if $ \boldsymbol{\Sigma}$ is a positive definite symmetric matrix then there is a square invertible matrix $ {\bf A}$ such that $ {\bf A}{\bf A}^T=\boldsymbol{\Sigma}$ so that there is a $ MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ distribution. ($ {\bf A}$ can be found via the Cholesky decomposition, e.g.)

When $ {\bf A}$ is singular $ {\bf X}$ will not have a density: $ \exists a$ such that $ P(a^T {\bf X}= a^T
\boldsymbol{\mu}) =1$; $ {\bf X}$ is confined to a hyperplane.

Still true: distribution of $ {\bf X}$ depends only on $ \boldsymbol{\Sigma}={\bf A}{\bf A}^T$: if $ {\bf A}{\bf A}^T = BB^T$ then $ {\bf A}{\bf Z}+\boldsymbol{\mu}$ and $ B{\bf Z}+\boldsymbol{\mu}$ have the same distribution.

Expectation, moments

Defn: If $ {\bf X}\in \mathbb {R}^p$ has density $ f$ then

$\displaystyle {\rm E}(g({\bf X})) = \int g(x)f(x)\, dx \,.

any $ g$ from $ \mathbb {R}^p$ to $ \mathbb {R}$.

FACT: if $ Y=g(X)$ for a smooth $ g$ (mapping $ \mathbb {R}\to \mathbb {R}$)

$\displaystyle {\rm E}(Y)$ $\displaystyle = \int y f_Y(y) \, dy$    
  $\displaystyle = \int g(x) f_Y(g(x)) g^\prime(x) \, dx$    
  $\displaystyle = {\rm E}(g(X))$    

by change of variables formula for integration. This is good because otherwise we might have two different values for $ {\rm E}(e^X)$.

Linearity: $ {\rm E}(aX+bY) = a{\rm E}(X)+b{\rm E}(Y)$ for real $ X$ and $ Y$.

Defn: The $ r^{\rm th}$ moment (about the origin) of a real rv $ X$ is $ \mu_r^\prime={\rm E}(X^r)$ (provided it exists). We generally use $ \mu$ for $ {\rm E}(X)$.

Defn: The $ r^{\rm th}$ central moment is

$\displaystyle \mu_r = {\rm E}[(X-\mu)^r]

We call $ \sigma^2 = \mu_2$ the variance.

Defn: For an $ \mathbb {R}^p$ valued random vector $ {\bf X}$

$\displaystyle \boldsymbol{\mu}_{\bf X}= {\rm E}({\bf X}) $

is the vector whose $ i^{\rm th}$ entry is $ {\rm E}(X_i)$ (provided all entries exist).

Fact: same idea used for random matrices.

Defn: The ( $ p \times p$) variance covariance matrix of $ {\bf X}$ is

$\displaystyle {\rm Var}({\bf X}) = {\rm E}\left[ ({\bf X}-\boldsymbol{\mu})({\bf X}-\boldsymbol{\mu})^T \right]

which exists provided each component $ X_i$ has a finite second moment.

Example moments: If $ Z\sim N(0,1)$ then

$\displaystyle {\rm E}(Z)$ $\displaystyle = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}$    
  $\displaystyle = \left.\frac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle = 0$    

and (integrating by parts)

$\displaystyle {\rm E}(Z^r) =$ $\displaystyle \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}$    
$\displaystyle =$ $\displaystyle \left.\frac{-z^{r-1}e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}$    

so that

$\displaystyle \mu_r = (r-1)\mu_{r-2}

for $ r \ge 2$. Remembering that $ \mu_1=0$ and

$\displaystyle \mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1

we find that

$\displaystyle \mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
(r-1)(r-3)\cdots 1 & \mbox{$r$ even} \,.

If now $ X\sim N(\mu,\sigma^2)$, that is, $ X\sim \sigma Z + \mu$, then $ {\rm E}(X) = \sigma {\rm E}(Z) + \mu = \mu$ and

$\displaystyle \mu_r(X) = {\rm E}[(X-\mu)^r] = \sigma^r {\rm E}(Z^r)

In particular, we see that our choice of notation $ N(\mu,\sigma^2)$ for the distribution of $ \sigma Z + \mu$ is justified; $ \sigma$ is indeed the variance.

Similarly for $ {\bf X}\sim MVN(\boldsymbol{\mu},\boldsymbol{\Sigma})$ we have $ {\bf X}={\bf A}{\bf Z}+\mu$ with $ {\bf Z}\sim MVN(0,I)$ and

$\displaystyle {\rm E}({\bf X}) = \boldsymbol{\mu}


$\displaystyle {\rm Var}({\bf X})$ $\displaystyle = {\rm E}\left\{({\bf X}-\boldsymbol{\mu})({\bf X}-\boldsymbol{\mu})^T\right\}$    
  $\displaystyle = {\rm E}\left\{ {\bf A}{\bf Z}({\bf A}{\bf Z})^T\right\}$    
  $\displaystyle = {\bf A}{\rm E}({\bf Z}{\bf Z}^T) {\bf A}^T$    
  $\displaystyle = {\bf A}I{\bf A}^T = \boldsymbol{\Sigma} \,.$    

Note use of easy calculation: $ {\rm E}({\bf Z})=0$ and

$\displaystyle {\rm Var}({\bf Z}) = {\rm E}({\bf Z}{\bf Z}^T) =I \,.

Moments and independence

Theorem: If $ X_1,\ldots,X_p$ are independent and each $ X_i$ is integrable then $ X=X_1\cdots X_p$ is integrable and

$\displaystyle {\rm E}(X_1\cdots X_p) = {\rm E}(X_1) \cdots {\rm E}(X_p) \,.

Moment Generating Functions

Defn: The moment generating function of a real valued $ X$ is

$\displaystyle M_X(t) = {\rm E}(e^{tX})

defined for those real $ t$ for which the expected value is finite.

Defn: The moment generating function of $ {\bf X}\in \mathbb {R}^p$ is

$\displaystyle M_{\bf X}(u) = {\rm E}[e^{u^T{\bf X}}]

defined for those vectors $ u$ for which the expected value is finite.

Example: If $ Z\sim N(0,1)$ then

$\displaystyle M_Z(t)$ $\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{tz-z^2/2} dz$    
  $\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-(z-t)^2/2+t^2/2} dz$    
  $\displaystyle = \frac{1}{\sqrt{2\pi}} \int_{-\infty}^\infty e^{-u^2/2+t^2/2} du$    
  $\displaystyle = e^{t^2/2}$    

Theorem: ($ p=1$) If $ M$ is finite for all $ t$ in a neighbourhood of $ {\bf0}$ then

  1. Every moment of $ X$ is finite.

  2. $ M$ is $ C^\infty$ (in fact $ M$ is analytic).

  3. $ \mu_k^\prime = \frac{d^k}{dt^k} M_X(0)$.

Note: $ C^\infty$ means has continuous derivatives of all orders. Analytic means has convergent power series expansion in neighbourhood of each $ t\in(-\epsilon,\epsilon)$.

The proof, and many other facts about mgfs, rely on techniques of complex variables.

Characterization & MGFs

Theorem: Suppose $ {\bf X}$ and $ {\bf Y}$ are $ \mathbb {R}^p$ valued random vectors such that

$\displaystyle M_{\bf X}({\bf u}) = M_{\bf Y}({\bf u})

for $ {\bf u}$ in some open neighbourhood of $ {\bf0}$ in $ \mathbb {R}^p$. Then $ {\bf X}$ and $ {\bf Y}$ have the same distribution.

The proof relies on techniques of complex variables.

MGFs and Sums

If $ X_1,\ldots,X_p$ are independent and $ Y=\sum X_i$ then mgf of $ Y$ is product mgfs of individual $ X_i$:

$\displaystyle {\rm E}(e^{tY}) = \prod_i {\rm E}(e^{tX_i})

or $ M_Y = \prod M_{X_i}$. (Also for multivariate $ X_i$.)

Example: If $ Z_1,\ldots,Z_p$ are independent $ N(0,1)$ then

$\displaystyle {\rm E}(e^{\sum a_i Z_i})$ $\displaystyle = \prod_i {\rm E}(e^{a_i Z_i})$    
  $\displaystyle = \prod_i e^{a_i^2/2}$    
  $\displaystyle = \exp(\sum a_i^2/2)$    

Conclusion: If $ {\bf Z}\sim MNV_p(0,I)$ then

$\displaystyle M_Z(u) = \exp(\sum u_i^2/2) = \exp({\bf u}^T {\bf u}/2).

Example: If $ X\sim N(\mu,\sigma^2)$ then $ X=\sigma Z + \mu$ and

$\displaystyle M_X(t) = {\rm E}(e^{t(\sigma Z+\mu)}) = e^{t\mu} e^{\sigma^2t^2/2}.

Theorem: Suppose $ {\bf X}= {\bf A}{\bf Z}+{\boldsymbol{\mu}}$ and $ {\bf Y}= {\bf A}^* {\bf Z}^* + {\boldsymbol{\mu^*}}$ where $ {\bf Z}\sim MVN_p(0,I)$ and $ {\bf Z}^* \sim MVN_q(0,I)$. Then $ {\bf X}$ and $ {\bf Y}$ have the same distribution if and only iff the following two conditions hold:

  1. $ \boldsymbol{\mu} = \boldsymbol{\mu}^*$.

  2. $ {\bf A}{\bf A}^T = {\bf A}^*({\bf A}^*)^T$.

Alternatively: if $ {\bf X}$, $ {\bf Y}$ each MVN then $ {\rm E}({\bf X})={\rm E}({\bf Y})$ and $ {\rm Var}({\bf X}) = {\rm Var}({\bf Y})$ imply that $ {\bf X}$ and $ {\bf Y}$ have the same distribution.

Proof: If 1 and 2 hold the mgf of $ {\bf X}$ is

$\displaystyle {\rm E}\left(e^{t^T{\bf X}}\right)$ $\displaystyle = {\rm E}\left(e^{t^T({\bf A}{\bf Z}+\boldsymbol\mu}\right)$    
  $\displaystyle = e^{t^T\boldsymbol\mu}{\rm E}\left(e^{({\bf A}^Tt)^T {\bf Z}}\right)$    
  $\displaystyle = e^{t^T\boldsymbol\mu+({\bf A}^Tt)^T({\bf A}^Tt)}$    
  $\displaystyle = e^{t^T\boldsymbol\mu+t^T\boldsymbol\Sigma t}$    

Thus $ M_{\bf X}=M_{\bf Y}$. Conversely if $ {\bf X}$ and $ {\bf Y}$ have the same distribution then they have the same mean and variance.

Thus mgf is determined by $ \boldsymbol\mu$ and $ \boldsymbol\Sigma$.

Theorem: If $ {\bf X}\sim MVN_p(\mu,\boldsymbol{\Sigma})$ then there is $ {\bf A}$ a $ p \times p$ matrix such that $ {\bf X}$ has same distribution as $ {\bf A}{\bf Z}+\boldsymbol{\mu}$ for $ {\bf Z}\sim MVN_p(0,I)$.

We may assume that $ {\bf A}$ is symmetric and non-negative definite, or that $ {\bf A}$ is upper triangular, or that $ Ba$ is lower triangular.

Proof: Pick any $ {\bf A}$ such that $ {\bf A}{\bf A}^T=\boldsymbol{\Sigma}$ such as $ {\bf P}{\bf D}^{1/2} {\bf P}^T$ from the spectral decomposition. Then $ {\bf A}{\bf Z}+\boldsymbol{\mu}\sim MVN_p(\boldsymbol{\mu},\boldsymbol{\Sigma})$.

From the symmetric square root can produce an upper triangular square root by the Gram Schmidt process: if $ {\bf A}$ has rows $ a_1^T,\ldots,a_p^T$ then let $ v_p$ be $ a_p/\sqrt{a_p^T a_p}$. Choose $ v_{p-1}$ proportional to $ a_{p-1} -bv_p$ where $ b = a_{p-1}^T v_p$ so that $ v_{p-1}$ has unit length. Continue in this way; you automatically get $ a_j^T v_k = 0$ if $ j < k$. If $ {\bf P}$ has columns $ v_1,\ldots,v_p$ then $ {\bf P}$ is orthogonal and $ {\bf A}{\bf P}$ is an upper triangular square root of $ \boldsymbol\Sigma$.

Variances, Covariances, Correlations

Defn: The covariance between $ {\bf X}$ and $ {\bf Y}$ is

$\displaystyle {\rm Cov}({\bf X},{\bf Y}) = {\rm E}\left\{
({\bf X}-\boldsymbol\mu_{\bf X})({\bf Y}-\boldsymbol\mu_{\bf Y})^T\right\}

This is a matrix.


Properties of the $ MVN$ distribution

1: All margins are multivariate normal: if

$\displaystyle {\bf X}= \left[\begin{array}{c} {\bf X}_1\\  {\bf X}_2\end{array} \right]

$\displaystyle \mu = \left[\begin{array}{c} \mu_1\\  \mu_2\end{array} \right]


$\displaystyle \boldsymbol{\Sigma} = \left[\begin{array}{cc} \boldsymbol{\Sigma}...
\boldsymbol{\Sigma}_{21} & \boldsymbol{\Sigma}_{22} \end{array} \right]

then $ {\bf X}\sim MVN(\boldsymbol{\mu},\boldsymbol{\Sigma}) \Rightarrow {\bf X}_1\sim MVN(\boldsymbol{\mu}_1,\boldsymbol{\Sigma}_{11})$.

2: $ {\bf M}{\bf X}+\boldsymbol{\nu} \sim MVN({\bf M}\boldsymbol{\mu}+\boldsymbol{\nu}, {\bf M} \boldsymbol{\Sigma} {\bf M}^T)$: affine transformation of MVN is normal.

3: If

$\displaystyle \boldsymbol{\Sigma}_{12} = {\rm Cov}({\bf X}_1,{\bf X}_2) = {\bf0}

then $ {\bf X}_1$ and $ {\bf X}_2$ are independent.

4: All conditionals are normal: the conditional distribution of $ {\bf X}_1$ given $ {\bf X}_2=x_2$ is $ MVN(\boldsymbol{\mu}_1+\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}
...-\boldsymbol{\Sigma}_{12}\boldsymbol{\Sigma}_{22}^{-1}\boldsymbol{\Sigma}_{21})$ Proof of ( 1): If $ {\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu}$ then

$\displaystyle {\bf X}_1 = \left[ I \vert {\bf0}\right] {\bf X}

for $ I$ the identity matrix of correct dimension.


$\displaystyle {\bf X}_1 = \left( \left[ I \vert {\bf0}\right]{\bf A}\right) {\bf Z}+ \left[ I \vert {\bf0}\right]

Compute mean and variance to check rest.

Proof of ( 2): If $ {\bf X}={\bf A}{\bf Z}+\boldsymbol{\mu}$ then

$\displaystyle {\bf M}{\bf X}+\boldsymbol{\nu}= {\bf M}{\bf A}{\bf Z}
+ \boldsymbol{\nu} +{\bf M}\boldsymbol{\mu}$

Proof of ( 3): If

$\displaystyle {\bf u} = \left[\begin{array}{c}{\bf u}_1 \\  {\bf u}_2\end{array}\right]


$\displaystyle M_{\bf X}(u) = M_{{\bf X}_1}({\bf u}_1)M_{{\bf X}_2}({\bf u}_2)

Proof of ( 4): first case: assume $ \boldsymbol{\Sigma}_{22}$ has an inverse.


$\displaystyle {\bf W}= {\bf X}_1 - \boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1}{\bf X}_2


$\displaystyle \left[\begin{array}{c}
{\bf W}\\  {\bf X}_2\end{array}\right]
{\bf X}_1 \\  {\bf X}_2\end{array}\right]

Thus $ ({\bf W},{\bf X}_2)^T$ is $ MVN(\boldsymbol\mu_1-\boldsymbol\Sigma_{12}\boldsymbol\Sigma_{22}^{-1}
\boldsymbol\mu_2,\boldsymbol\Sigma^*)$ where

$\displaystyle \boldsymbol\Sigma^* = \left[\begin{array}{cc}
...mbol\Sigma_{21} & {\bf0} \\
{\bf0}& \boldsymbol\Sigma_{22}\end{array}\right]

Now joint density of $ {\bf W}$ and $ {\bf X}$ factors

$\displaystyle f_{{\bf W},{\bf X}_2}(w,x_2) = f_{{\bf W}}(w)f_{{\bf X}_2}(x_2)

By change of variables joint density of $ {\bf X}$ is

$\displaystyle f_{{\bf X}_1,{\bf X}_2}(x_1,x_2) = cf_{{\bf W}}(x_1-{\bf M}x_2)f_{{\bf X}_2}(x_2)

where $ c=1$ is the constant Jacobian of the linear transformation from $ ({\bf W},{\bf X}_2)$ to $ ({\bf X}_1,{\bf X}_2)$ and

$\displaystyle {\bf M}= \boldsymbol\Sigma_{12}

Thus conditional density of $ {\bf X}_1$ given $ {\bf X}_2=x_2$ is

$\displaystyle \frac{f_{{\bf W}}(x_1-{\bf M}x_2)f_{{\bf X}_2}(x_2)}{f_{{\bf X}_2}(x_2)}
= f_{{\bf W}}(x_1-{\bf M}x_2)

As a function of $ x_1$ this density has the form of the advertised multivariate normal density.

Specialization to bivariate case:


$\displaystyle \boldsymbol\Sigma = \left[\begin{array}{cc} \sigma_1^2 & \rho\sigma_1\sigma_2
& \sigma_2^2\end{array}\right]

where we define

$\displaystyle \rho = \frac{{\rm Cov}(X_1,X_2)}{\sqrt{{\rm Var}(X_1){\rm Var}(X_2)}}

Note that

$\displaystyle \sigma_i^2 = {\rm Var}(X_i)


$\displaystyle W= X_1 - \rho\frac{\sigma_1}{\sigma_2} X_2

is independent of $ X_2$. The marginal distribution of $ W$ is $ N(\mu_1-\rho\sigma_1 \mu_2/\sigma_2,\tau^2)$ where

$\displaystyle \tau^2 =$ $\displaystyle {\rm Var}(X_1) - 2\rho\frac{\sigma_1}{\sigma_2}{\rm Cov}(X_1,X_2)$    
  $\displaystyle + \left(\rho\frac{\sigma_1}{\sigma_2}\right)^2 {\rm Var}(X_2)$    

This simplifies to

$\displaystyle \sigma_1^2(1-\rho^2)

Notice that it follows that

$\displaystyle -1 \le \rho \le 1

More generally: any $ X$ and $ Y$:

0 $\displaystyle \le {\rm Var}(X-\lambda Y)$    
  $\displaystyle = {\rm Var}(X) - 2 \lambda{\rm Cov}(X,Y) + \lambda^2 {\rm Var}(Y)$    

RHS is minimized at

$\displaystyle \lambda = \frac{{\rm Cov}(X,Y)}{{\rm Var}(Y)}

Minimum value is

$\displaystyle 0 \le {\rm Var}(X) (1 - \rho_{XY}^2)


$\displaystyle \rho_{XY} =\frac{{\rm Cov}(X,Y)}{\sqrt{{\rm Var}(X){\rm Var}(Y)}}

defines the correlation between $ X$ and $ Y$.

Multiple Correlation
Now suppose $ {\bf X}_2$ is scalar but $ {\bf X}_1$ is vector.

Defn: Multiple correlation between $ {\bf X}_1$ and $ X_2$

$\displaystyle R^2({\bf X}_1,X_2) = \max\vert\rho_{{\bf a}^T{\bf X}_1,X_2}\vert^2

over all $ {\bf a}\neq 0$.

Thus: maximize

$\displaystyle \frac{{\rm Cov}^2({\bf a}^T {\bf X}_1,X_2)}{{\rm Var}
({\bf a}^T{...
...\left({\bf a}^T \boldsymbol\Sigma_{11} {\bf a}\right)

Put $ b=\boldsymbol\Sigma_{11}^{1/2}{\bf a}$. For $ \boldsymbol\Sigma_{11}$ invertible problem is equivalent to maximizing

$\displaystyle \frac{{\bf b}^T {\bf Q}{\bf b}}{{\bf b}^T {\bf b}}


$\displaystyle {\bf Q}= \boldsymbol\Sigma_{11}^{-1/2}\boldsymbol\Sigma_{12}\boldsymbol\Sigma_{21}\boldsymbol\Sigma_{11}^{-1/2}

Solution: find largest eigenvalue of $ {\bf Q}$.


$\displaystyle {\bf Q}= {\bf v}{\bf v}^T


$\displaystyle {\bf v}=

is a vector. Set

$\displaystyle {\bf v}{\bf v}^T {\bf x} = \lambda {\bf x}

and multiply by $ {\bf v}^T$ to get

$\displaystyle {\bf v}^T{\bf x} = 0$    or $\displaystyle \lambda = {\bf v}^T {\bf v}

If $ {\bf v}^T{\bf x}=0$ then we see $ \lambda=0$ so largest eigenvalue is $ {\bf v}^T{\bf v}$.

Summary: maximum squared correlation is

$\displaystyle R^2({\bf X}_1,X_2) = \frac{ {\bf v}^T {\bf v}}{\boldsymbol\Sigma_...

Achieved when eigenvector is $ {\bf x}={\bf v}={\bf b}$ so

$\displaystyle {\bf a}=\boldsymbol\Sigma_{11}^{-1/2}\boldsymbol\Sigma_{11}^{-1/2}

Notice: since $ R^2$ is squared correlation between two scalars ( $ {\bf a}^t {\bf X}_1$ and $ X_2$) we have

$\displaystyle 0 \le R^2 \le 1

Equals 1 iff $ X_2$ is linear combination of $ {\bf X}_1$.

Correlation matrices, partial correlations:

Correlation between two scalars $ X$ and $ Y$ is

$\displaystyle \rho_{XY} =\frac{{\rm Cov}(X,Y)}{\sqrt{{\rm Var}(X){\rm Var}(Y)}}

If $ {\bf X}$ has variance $ \boldsymbol\Sigma$ then the correlation matrix of $ {\bf X}$ is $ {\bf R}_{\bf X}$ with entries

$\displaystyle R_{ij} = \frac{{\rm Cov}(X_i,X_j)}{\sqrt{{\rm Var}(X_i){\rm Var}(X_j)}}
= \frac{\Sigma_{ij}}{\sqrt{\Sigma_{ii}\Sigma_{jj}}}

If $ {\bf X}_1,{\bf X}_2$ are MVN with the usual partitioned variance covariance matrix then the conditional variance of $ {\bf X}_1$ given $ {\bf X}_2$ is

$\displaystyle \boldsymbol\Sigma_{11\cdot 2} = \boldsymbol\Sigma_{11} -

From this define partial correlation matrix

$\displaystyle {\bf R}_{11\cdot 2} = \frac{(\boldsymbol\Sigma_{11\cdot 2})_{ij}
}{\sqrt{\boldsymbol\Sigma_{11\cdot 2})_{ii}\boldsymbol\Sigma_{11\cdot 2})_{jj}}}

Note: these are used even when $ {\bf X}_1,{\bf X}_2$ are NOT MVN

Richard Lockhart