No Title

STAT 801 Lecture 9

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Derive $\chi_2^2$ density - you need to recognize the CDF method.
Derive the $t_\nu$ density.
Introduce the notion of expected value.

Last time: We used the change of variables formula to compute the density of

$\begin{displaymath}Y=(\sqrt{n}\bar{Z}, Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})^t \, . \end{displaymath}$

It factored into a piece involving $Y_1=\sqrt{n}\bar{Z}$ only and another piece invoving $(Y_2,\ldots,Y_n) = (Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})$ only. Hence $\sqrt{n}\bar{Z}$ is independent of $(Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})$ . Since
$\begin{align*}(n-1)s^2 = & (Z_1-\bar{Z})^2 + \cdots + (Z_{n-1}-\bar{Z})^2 \\ & + \left\{ (Z_1-\bar{Z})+ \cdots + (Z_{n-1}-\bar{Z}) \right\}^2 \end{align*}$
we find that $\bar{Z}$ and s are independent. The factor involving Y₁ is a standard normal density proving $\sqrt{n}\bar{Z} \sim N(0,1)$ . It remains to prove the last two parts of the theorem.

Suppose that $Z_1,\ldots,Z_n$ are independent N(0,1). We define the $\chi^2_n$ distribution to be that of $U=Z_1^2 + \cdots + Z_n^2$ . Thus our third assertion is that (n-1)s² can be rewritten as

$\begin{displaymath}(n-1)s^2 = W_1^2 + \cdots +W_{n-1}^2 \end{displaymath}$

where $W_1,\ldots,W_{n-1}$ are iid N(0,1). In your homework I try to get you to do this for n=3. Here I merely derive the density of $\chi_n^2$ . The result for n=1 is in Lecture 3; in class I will do the case n=2. Here is the general case: Define angles $\theta_1,\ldots,\theta_{n-1}$ by
$\begin{align*}Z_1 &= U^{1/2} \cos\theta_1 \\ Z_2 &= U^{1/2} \sin\theta_1\cos\th... ...\theta_{n-1} \\ Z_n &= U^{1/2} \sin\theta_1\cdots \sin\theta_{n-1} \end{align*}$
(These are spherical co-ordinates in n dimensions. The $\theta$ values run from 0 to $\pi$ except for the last $\theta$ whose values run from 0 to $2\pi$ .) We will use the change of variables formula to get the joint density of $(U,\theta_1,\ldots,\theta_{n-1})$ . Note the following derivative formulas

$\begin{displaymath}\frac{\partial Z_i}{\partial U} = \frac{1}{2U} Z_i \end{displaymath}$

and

$\begin{displaymath}\frac{\partial Z_i}{\partial\theta_j} = \left\{ \begin{array}... ...n\theta_i & j=i \\ Z_i\cot\theta_j & j < i \end{array}\right. \end{displaymath}$

I now fix the case n=3 to clarify the formulas. The matrix of partial derivatives is

$\begin{displaymath}\left[\begin{array}{ccc} U^{-1/2} \cos\theta_1 /2 & -U^{1/2} ... ...theta_2 & U^{1/2} \sin\theta_1\cos\theta_2 \end{array}\right] \end{displaymath}$

The determinant of this matrix may be found by adding $2U^{1/2}\cos\theta_j/\sin\theta_j$ times column 1 to column j+1 (which doesn't change the determinant). The resulting matrix is lower triangular with diagonal entries (after a small amount of algebra) $U^{-1/2} \cos\theta_1 /2$ , $U^{1/2}\cos\theta_2/ \cos\theta_1$ and $U^{1/2} \sin\theta_1/\cos\theta_2$ . We multiply these together to get

$\begin{displaymath}U^{1/2}\sin(\theta_1)/2 \end{displaymath}$

which is non-negative for all U and $\theta_1$ . For general n we see that every term in the first column contains a factor U^-1/2/2 while every other entry has a factor U^1/2. Multiplying a column in a matrix by c multiplies the determinant by c so the Jacobian of the transformation is u^(n-1)/2u^-1/2/2 times some function, say h, which depends only on the angles. Thus the joint density of $U,\theta_1,\ldots \theta_{n-1}$ is

$\begin{displaymath}(2\pi)^{-n/2} \exp(-u/2) u^{(n-2)/2}h(\theta_1, \cdots, \theta_{n-1}) / 2 \end{displaymath}$

To compute the density of U we must do an n-1 dimensional multiple integral $d\theta_{n-1}\cdots d\theta_1$ . We see that the answer has the form

$\begin{displaymath}cu^{(n-2)/2} \exp(-u/2) \end{displaymath}$

for some c which we can evaluate by making

$\begin{displaymath}\int f_U(u) du = c \int u^{(n-2)/2} \exp(-u/2) du =1 \end{displaymath}$

Substitute y=u/2, du=2dy to see that

$\begin{displaymath}c 2^{(n-2)/2} 2 \int y^{(n-2)/2}e^{-y} dy = c 2^{(n-1)/2} \Gamma(n/2) = 1 \end{displaymath}$

so that the $\chi^2$ density is

$\begin{displaymath}\frac{1}{2\Gamma(n/2)} \left(\frac{u}{2}\right)^{(n-2)/2} e^{-u/2} \end{displaymath}$

Finally the fourth part of the theorem is a consequence of the first 3 parts of the theorem and the definition of the $t_\nu$ distribution, namely, that $T\sim t_\nu$ if it has the same distribution as

$\begin{displaymath}Z/\sqrt{U/\nu} \end{displaymath}$

where $Z\sim N(0,1)$ , $U\sim\chi^2_\nu$ and Z and U are independent.

However, I now derive the density of T in this definition:
$\begin{align*}P(T \le t) &= P( Z \le t\sqrt{U/\nu}) \\ & = \int_0^\infty \int_{-\infty}^{t\sqrt{u/\nu}} f_Z(z)f_U(u) dz du \end{align*}$
I can differentiate this with respect to t by simply differentiating the inner integral:

$\begin{displaymath}\frac{\partial}{\partial t}\int_{at}^{bt} f(x)dx = bf(bt)-af(at) \end{displaymath}$

by the fundamental theorem of calculus. Hence

$\begin{displaymath}\frac{d}{dt} P(T \le t) = \int_0^\infty f_U(u) \sqrt{u/\nu}\frac{\exp[-t^2u/(2\nu)]}{\sqrt{2\pi}} du \, . \end{displaymath}$

Now I plug in

$\begin{displaymath}f_U(u)= \frac{1}{2\Gamma(\nu/2)}(u/2)^{(\nu-2)/2} e^{-u/2} \end{displaymath}$

to get

$\begin{displaymath}f_T(t) = \int_0^\infty \frac{1}{2\sqrt{\pi\nu}\Gamma(\nu/2)} (u/2)^{(\nu-1)/2} \exp[-u(1+t^2/\nu)/2] \, du \, . \end{displaymath}$

Make the substitution $y=u(1+t^2/\nu)/2$ , $dy=(1+t^2/\nu)du/2$ $(u/2)^{(\nu-1)/2}= [y/(1+t^2/\nu)]^{(\nu-1)/2}$ to get

$\begin{displaymath}f_T(t) = \frac{1}{\sqrt{\pi\nu}\Gamma(\nu/2)}(1+t^2/\nu)^{-(\nu+1)/2} \int_0^\infty y^{(\nu-1)/2} e^{-y} dy \end{displaymath}$

$\begin{displaymath}f_T(t)= \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)}\frac{1}{(1+t^2/\nu)^{(\nu+1)/2}} \end{displaymath}$

Expectation, moments

We give two definitions of expected values:

Def'n If X has density f then

$\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,. \end{displaymath}$

Def'n: If X has discrete density f then

$\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,. \end{displaymath}$

Now if Y=g(X) for smooth g then

$\begin{displaymath}E(Y) = \int y f_Y(y) = \int g(x) f_Y(g(x)) g^\prime(x) \, dy = E(g(X)) \end{displaymath}$

by the change of variables formula for integration. This is good because otherwise we might have two different values for E(e^X).

In general, there are random variables which are neither absolutely continuous nor discrete. Look at my STAT 801 web pages to see how $\text{E}$ is defined in general.

Facts: E is a linear, monotone, positive operator:

1.: Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.
2.: Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$ .
3.: Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$ .

Major technical theorems:

Monotone Convergence: If $0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

$\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n) \end{displaymath}$

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and there is a random variable X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

$\begin{displaymath}E(X_n) \to E(X) \end{displaymath}$

This is often used with all Y_n the same random variable Y.

Fatou's Lemma: If $X_n \ge 0$ then

$\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n) \end{displaymath}$

Theorem: With this definition of E if X has density f(x) (even in R^p say) and Y=g(X) then

$\begin{displaymath}E(Y) = \int g(x) f(x) dx \, . \end{displaymath}$

(This could be a multiple integral.) If X has pmf f then

$\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, . \end{displaymath}$

This works for instance even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

$\begin{displaymath}\mu_r = E[(X-\mu)^r] \end{displaymath}$

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an R^p valued random vector X we define $\mu_X = E(X)$ to be the vector whose $i^{\rm th}$ entry is E(X_i)(provided all entries exist).

Def'n: The ( $p \times p$ ) variance covariance matrix of X is

$\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right] \end{displaymath}$

which exists provided each component X_i has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
$\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)] \\ &\... ...-\mu\vert \ge t)\right] \\ & \le \frac{E[\vert X-\mu\vert^r]}{t^r} \end{align*}$
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
$\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi} \\ &=... ...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty \\ & = 0 \end{align*}$
and (integrating by parts)
$\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi} \\ ... ...ty + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi} \end{align*}$
so that

$\begin{displaymath}\mu_r = (r-1)\mu_{r-2} \end{displaymath}$

for $r \ge 2$ . Remembering that $\mu_1=0$ and

$\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1 \end{displaymath}$

we find that

$\begin{displaymath}\mu_r = \left\{ \begin{array}{ll} 0 & \mbox{$r$ odd} \\ (r-1)(r-3)\cdots 1 & \mbox{$r$ even} \end{array}\right. \end{displaymath}$

If now $X\sim N(\mu,\sigma^2)$ , that is, $X\sim \sigma Z + \mu$ , then $E(X) = \sigma E(Z) + \mu = \mu$ and

$\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r) \end{displaymath}$

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Last time defined expectation and stated Monotone Convergence Theorem, Dominated Convergence Theorem and Fatou's Lemma. Reviewed elementary definitions of expected value and basic properties of E.