next up previous


Postscript version of this file

STAT 801 Lecture 9

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Last time: We used the change of variables formula to compute the density of

\begin{displaymath}Y=(\sqrt{n}\bar{Z}, Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})^t \, .
\end{displaymath}

It factored into a piece involving $Y_1=\sqrt{n}\bar{Z}$ only and another piece invoving $(Y_2,\ldots,Y_n) =
(Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})$ only. Hence $\sqrt{n}\bar{Z}$ is independent of $(Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z})$. Since
\begin{align*}(n-1)s^2 = & (Z_1-\bar{Z})^2 + \cdots + (Z_{n-1}-\bar{Z})^2
\\
& + \left\{ (Z_1-\bar{Z})+ \cdots + (Z_{n-1}-\bar{Z}) \right\}^2
\end{align*}
we find that $\bar{Z}$ and s are independent. The factor involving Y1 is a standard normal density proving $\sqrt{n}\bar{Z} \sim N(0,1)$. It remains to prove the last two parts of the theorem.

Suppose that $Z_1,\ldots,Z_n$ are independent N(0,1). We define the $\chi^2_n$ distribution to be that of $U=Z_1^2 + \cdots + Z_n^2$. Thus our third assertion is that (n-1)s2 can be rewritten as

\begin{displaymath}(n-1)s^2 = W_1^2 + \cdots +W_{n-1}^2
\end{displaymath}

where $W_1,\ldots,W_{n-1}$ are iid N(0,1). In your homework I try to get you to do this for n=3. Here I merely derive the density of $\chi_n^2$. The result for n=1 is in Lecture 3; in class I will do the case n=2. Here is the general case: Define angles $\theta_1,\ldots,\theta_{n-1}$ by
\begin{align*}Z_1 &= U^{1/2} \cos\theta_1
\\
Z_2 &= U^{1/2} \sin\theta_1\cos\th...
...\theta_{n-1}
\\
Z_n &= U^{1/2} \sin\theta_1\cdots \sin\theta_{n-1}
\end{align*}
(These are spherical co-ordinates in n dimensions. The $\theta$ values run from 0 to $\pi$ except for the last $\theta$ whose values run from 0 to $2\pi$.) We will use the change of variables formula to get the joint density of $(U,\theta_1,\ldots,\theta_{n-1})$. Note the following derivative formulas

\begin{displaymath}\frac{\partial Z_i}{\partial U} = \frac{1}{2U} Z_i
\end{displaymath}

and

\begin{displaymath}\frac{\partial Z_i}{\partial\theta_j} =
\left\{ \begin{array}...
...n\theta_i & j=i
\\
Z_i\cot\theta_j & j < i
\end{array}\right.
\end{displaymath}

I now fix the case n=3 to clarify the formulas. The matrix of partial derivatives is

\begin{displaymath}\left[\begin{array}{ccc}
U^{-1/2} \cos\theta_1 /2
&
-U^{1/2} ...
...theta_2
&
U^{1/2} \sin\theta_1\cos\theta_2
\end{array}\right]
\end{displaymath}

The determinant of this matrix may be found by adding $2U^{1/2}\cos\theta_j/\sin\theta_j$ times column 1 to column j+1 (which doesn't change the determinant). The resulting matrix is lower triangular with diagonal entries (after a small amount of algebra) $U^{-1/2} \cos\theta_1 /2$, $U^{1/2}\cos\theta_2/ \cos\theta_1$ and $U^{1/2} \sin\theta_1/\cos\theta_2$. We multiply these together to get

\begin{displaymath}U^{1/2}\sin(\theta_1)/2
\end{displaymath}

which is non-negative for all U and $\theta_1$. For general n we see that every term in the first column contains a factor U-1/2/2 while every other entry has a factor U1/2. Multiplying a column in a matrix by c multiplies the determinant by c so the Jacobian of the transformation is u(n-1)/2u-1/2/2 times some function, say h, which depends only on the angles. Thus the joint density of $U,\theta_1,\ldots \theta_{n-1}$ is

\begin{displaymath}(2\pi)^{-n/2} \exp(-u/2) u^{(n-2)/2}h(\theta_1, \cdots, \theta_{n-1}) / 2
\end{displaymath}

To compute the density of U we must do an n-1 dimensional multiple integral $d\theta_{n-1}\cdots d\theta_1$. We see that the answer has the form

\begin{displaymath}cu^{(n-2)/2} \exp(-u/2)
\end{displaymath}

for some c which we can evaluate by making

\begin{displaymath}\int f_U(u) du = c \int u^{(n-2)/2} \exp(-u/2) du =1
\end{displaymath}

Substitute y=u/2, du=2dy to see that

\begin{displaymath}c 2^{(n-2)/2} 2 \int y^{(n-2)/2}e^{-y} dy = c 2^{(n-1)/2} \Gamma(n/2) = 1
\end{displaymath}

so that the $\chi^2$ density is

\begin{displaymath}\frac{1}{2\Gamma(n/2)} \left(\frac{u}{2}\right)^{(n-2)/2} e^{-u/2}
\end{displaymath}

Finally the fourth part of the theorem is a consequence of the first 3 parts of the theorem and the definition of the $t_\nu$distribution, namely, that $T\sim t_\nu$ if it has the same distribution as

\begin{displaymath}Z/\sqrt{U/\nu}
\end{displaymath}

where $Z\sim N(0,1)$, $U\sim\chi^2_\nu$ and Z and U are independent.

However, I now derive the density of T in this definition:
\begin{align*}P(T \le t) &= P( Z \le t\sqrt{U/\nu})
\\
& =
\int_0^\infty \int_{-\infty}^{t\sqrt{u/\nu}} f_Z(z)f_U(u) dz du
\end{align*}
I can differentiate this with respect to t by simply differentiating the inner integral:

\begin{displaymath}\frac{\partial}{\partial t}\int_{at}^{bt} f(x)dx
=
bf(bt)-af(at)
\end{displaymath}

by the fundamental theorem of calculus. Hence

\begin{displaymath}\frac{d}{dt} P(T \le t) =
\int_0^\infty f_U(u) \sqrt{u/\nu}\frac{\exp[-t^2u/(2\nu)]}{\sqrt{2\pi}} du
\, .
\end{displaymath}

Now I plug in

\begin{displaymath}f_U(u)= \frac{1}{2\Gamma(\nu/2)}(u/2)^{(\nu-2)/2} e^{-u/2}
\end{displaymath}

to get

\begin{displaymath}f_T(t) = \int_0^\infty \frac{1}{2\sqrt{\pi\nu}\Gamma(\nu/2)}
(u/2)^{(\nu-1)/2} \exp[-u(1+t^2/\nu)/2] \, du \, .
\end{displaymath}

Make the substitution $y=u(1+t^2/\nu)/2$, $dy=(1+t^2/\nu)du/2$ $(u/2)^{(\nu-1)/2}= [y/(1+t^2/\nu)]^{(\nu-1)/2}$to get

\begin{displaymath}f_T(t) = \frac{1}{\sqrt{\pi\nu}\Gamma(\nu/2)}(1+t^2/\nu)^{-(\nu+1)/2}
\int_0^\infty y^{(\nu-1)/2} e^{-y} dy
\end{displaymath}

or

\begin{displaymath}f_T(t)= \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)}\frac{1}{(1+t^2/\nu)^{(\nu+1)/2}}
\end{displaymath}

Expectation, moments

We give two definitions of expected values:

Def'n If X has density f then

\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,.
\end{displaymath}

Def'n: If X has discrete density f then

\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,.
\end{displaymath}

Now if Y=g(X) for smooth g then

\begin{displaymath}E(Y) = \int y f_Y(y) = \int g(x) f_Y(g(x)) g^\prime(x) \, dy = E(g(X))
\end{displaymath}

by the change of variables formula for integration. This is good because otherwise we might have two different values for E(eX).

In general, there are random variables which are neither absolutely continuous nor discrete. Look at my STAT 801 web pages to see how $\text{E}$ is defined in general.

Facts: E is a linear, monotone, positive operator:

1.
Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.

2.
Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$.

3.
Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n)
\end{displaymath}

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and there is a random variable X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

This is often used with all Yn the same random variable Y.

Fatou's Lemma: If $X_n \ge 0$ then

\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n)
\end{displaymath}

Theorem: With this definition of E if X has density f(x) (even in Rp say) and Y=g(X) then

\begin{displaymath}E(Y) = \int g(x) f(x) dx \, .
\end{displaymath}

(This could be a multiple integral.) If X has pmf f then

\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, .
\end{displaymath}

This works for instance even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an Rp valued random vector X we define $\mu_X = E(X) $ to be the vector whose $i^{\rm th}$ entry is E(Xi)(provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}
\\
&=...
...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty
\\
& = 0
\end{align*}
and (integrating by parts)
\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}
\\ ...
...ty
+ (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}
\end{align*}
so that

\begin{displaymath}\mu_r = (r-1)\mu_{r-2}
\end{displaymath}

for $r \ge 2$. Remembering that $\mu_1=0$ and

\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
\end{displaymath}

we find that

\begin{displaymath}\mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even}
\end{array}\right.
\end{displaymath}

If now $X\sim N(\mu,\sigma^2)$, that is, $X\sim \sigma Z + \mu$, then $E(X) = \sigma E(Z) + \mu = \mu$and

\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r)
\end{displaymath}

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Last time defined expectation and stated Monotone Convergence Theorem, Dominated Convergence Theorem and Fatou's Lemma. Reviewed elementary definitions of expected value and basic properties of E.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an Rp valued random vector X we define $\mu_X = E(X) $ to be the vector whose $i^{\rm th}$ entry is E(Xi)(provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}{\rm Var}(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}
\\
&=...
...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty
\\
& = 0
\end{align*}
and (integrating by parts)
\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}
\\ ...
...ty
+ (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}
\end{align*}
so that

\begin{displaymath}\mu_r = (r-1)\mu_{r-2}
\end{displaymath}

for $r \ge 2$. Remembering that $\mu_1=0$ and

\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
\end{displaymath}

we find that

\begin{displaymath}\mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even}
\end{array}\right.
\end{displaymath}

If now $X\sim N(\mu,\sigma^2)$, that is, $X\sim \sigma Z + \mu$, then $E(X) = \sigma E(Z) + \mu = \mu$and

\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r)
\end{displaymath}

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.


next up previous



Richard Lockhart
1999-09-26