next up previous


Postscript version of this file

STAT 801 Lecture 4

Reading for Today's Lecture:

Goals of Today's Lecture:

Review of end of last time

Theorem 1   Suppose $X_1,\ldots,X_n$ are independent $N(\mu,\sigma^2)$ random variables. (That is each satisfies my definition above in 1 dimension.) Then

1.
The sample mean $\bar X$ and the sample variance s2 are independent.

2.
$n^{1/2}(\bar{X} - \mu)/\sigma \sim N(0,1)$

3.
$(n-1)s^2/\sigma^2 \sim \chi^2_{n-1}$

4.
$n^{1/2}(\bar{X} - \mu)/s \sim t_{n-1}$

Proof: Reduces to $\mu=0$ and $\sigma=1$.

Step 1: Define

\begin{displaymath}Y=(\sqrt{n}\bar{Z}, Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}) =MZ
\end{displaymath}

for suitable matrix M. Then
\begin{align*}f_Y(y) =& (2\pi)^{-n/2} \exp[-y^t\Sigma^{-1}y/2]/\vert\det M\vert
...
...^{-(n-1)/2}\exp[-{\bf y}_2^t Q^{-1} {\bf y}_2/2]}{\vert\det M\vert}
\end{align*}
where ${\bf y}_2 = (y_2,\ldots,y_n)^t$.

Notice that this is a factorization into a function of y1 times a function of $y_2, \ldots,y_n$. Thus $\sqrt{n}\bar{Z}$ is independent of $Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}$. Since sZ2 is a function of $Z_1-\bar{Z},\ldots,Z_{n-1}-\bar{Z}$ we see that $\sqrt{n}\bar{Z}$ and sZ2 are independent (remember that $Z_n-\bar{Z} = -\sum_1^{n-1}
(Z_i-\bar{Z})$).

Furthermore the density of Y1 is a multiple of the function of y1 in the factorization above. But the factor in question is the standard normal density so $\sqrt{n}\bar{Z}\sim N(0,1)$.

We have now done the first 2 parts of the theorem. The third part is a homework exercise but I will outline the derivation of the $\chi^2$ density.

Suppose that $Z_1,\ldots,Z_n$ are independent N(0,1). We define the $\chi^2_n$ distribution to be that of $U=Z_1^2 + \cdots + Z_n^2$. Define angles $\theta_1,\ldots,\theta_{n-1}$ by
\begin{align*}Z_1 &= U^{1/2} \cos\theta_1
\\
Z_2 &= U^{1/2} \sin\theta_1\cos\th...
...\theta_{n-1}
\\
Z_n &= U^{1/2} \sin\theta_1\cdots \sin\theta_{n-1}
\end{align*}
(These are spherical co-ordinates in n dimensions. The $\theta$ values run from 0 to $\pi$ except for the last $\theta$ whose values run from 0 to $2\pi$.) Then note the following derivative formulas

\begin{displaymath}\frac{\partial Z_i}{\partial U} = \frac{1}{2U} Z_i
\end{displaymath}

and

\begin{displaymath}\frac{\partial Z_i}{\partial\theta_j} =
\left\{ \begin{array}...
...n\theta_i & j=i
\\
Z_i\cot\theta_j & j < i
\end{array}\right.
\end{displaymath}

I now fix the case n=3 to clarify the formulas. The matrix of partial derivatives is

\begin{displaymath}\left[\begin{array}{ccc}
U^{-1/2} \cos\theta_1 /2
&
-U^{1/2} ...
...theta_2
&
U^{1/2} \sin\theta_1\cos\theta_2
\end{array}\right]
\end{displaymath}

The determinant of this matrix may be found by adding $2U^{1/2}\cos\theta_j/\sin\theta_j$ times column 1 to column j+1 (which doesn't change the determinant). The resulting matrix is lower triangular with diagonal entries (after a small amount of algebra) $U^{-1/2} \cos\theta_1 /2$, $U^{1/2}\cos\theta_2/ \cos\theta_1$ and $U^{1/2} \sin\theta_1/\cos\theta_2$. We multiply these together to get

\begin{displaymath}U^{1/2}\sin(\theta_1)/2
\end{displaymath}

which is non-negative for all U and $\theta_1$. For general n we see that every term in the first column contains a factor U-1/2/2 while every other entry has a factor U1/2. Multiplying a column in a matrix by c multiplies the determinant by c so the Jacobian of the transformation is u(n-1)/2/2 times some function, say h, which depends only on the angles. Thus the joint density of $U,\theta_1,\ldots \theta_{n-1}$ is

\begin{displaymath}(2\pi)^{-n/2} \exp(-u/2) u^{(n-2)/2}h(\theta_1, \cdots, \theta_{n-1}) / 2
\end{displaymath}

To compute the density of U we must do an n-1 dimensional multiple integral $d\theta_{n-1}\cdots d\theta_1$. We see that the answer has the form

\begin{displaymath}cu^{(n-2)/2} \exp(-u/2)
\end{displaymath}

for some c which we can evaluate by making

\begin{displaymath}\int f_U(u) du = c \int u^{(n-2)/2} \exp(-u/2) du =1
\end{displaymath}

Substitute y=u/2, du=2dy to see that

\begin{displaymath}c 2^{(n-2)/2} 2 \int y^{(n-2)/2}e^{-y} dy = c 2^{(n-1)/2} \Gamma(n/2) = 1
\end{displaymath}

so that the $\chi^2$ density is

\begin{displaymath}\frac{1}{2\Gamma(n/2)} \left(\frac{u}{2}\right)^{(n-2)/2} e^{-u/2}
\end{displaymath}

Finally the fourth part of the theorem is a consequence of the first 3 parts of the theorem and the definition of the $t_\nu$ distribution, namely, that $T\sim t_\nu$ if it has the same distribution as

\begin{displaymath}Z/\sqrt{U/\nu}
\end{displaymath}

where $Z\sim N(0,1)$, $U\sim\chi^2_\nu$ and Z and U are independent.

However, I now derive the density of T in this definition:
\begin{align*}P(T \le t) &= P( Z \le t\sqrt{U/\nu})
\\
& =
\int_0^\infty \int_{-\infty}^{t\sqrt{u/\nu}} f_Z(z)f_U(u) dz du
\end{align*}
I can differentiate this with respect to t by simply differentiating the inner integral:

\begin{displaymath}\frac{\partial}{\partial t}\int_{at}^{bt} f(x)dx
=
bf(bt)-af(at)
\end{displaymath}

by the fundamental theorem of calculus. Hence

\begin{displaymath}\frac{d}{dt} P(T \le t) =
\int_0^\infty f_U(u) \sqrt{u/\nu}\frac{\exp[-t^2u/(2\nu)]}{\sqrt{2\pi}} du
\, .
\end{displaymath}

Now I plug in

\begin{displaymath}f_U(u)= \frac{1}{2\Gamma(\nu/2)}(u/2)^{(\nu-2)/2} e^{-u/2}
\end{displaymath}

to get

\begin{displaymath}f_T(t) = \int_0^\infty \frac{1}{2\sqrt{\pi\nu}\Gamma(\nu/2)}
(u/2)^{(\nu-1)/2} \exp[-u(1+t^2/\nu)/2] \, du \, .
\end{displaymath}

Make the substitution $y=u(1+t^2/\nu)/2$, $dy=(1+t^2/\nu)du/2$ $(u/2)^{(\nu-1)/2}= [y/(1+t^2/\nu)]^{(\nu-1)/2}$ to get

\begin{displaymath}f_T(t) = \frac{1}{\sqrt{\pi\nu}\Gamma(\nu/2)}(1+t^2/\nu)^{-(\nu+1)/2}
\int_0^\infty y^{(\nu-1)/2} e^{-y} dy
\end{displaymath}

or

\begin{displaymath}f_T(t)= \frac{\Gamma((\nu+1)/2)}{\sqrt{\pi\nu}\Gamma(\nu/2)(1+t^2/\nu)^{(\nu+1)/2}} \, .
\end{displaymath}

Expectation, moments

In elementary courses we give two definitions of expected values:

Def'n If X has density f then

\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,.
\end{displaymath}

Def'n: If X has discrete density f then

\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,.
\end{displaymath}

Now if Y=g(X) for smooth g then

\begin{displaymath}E(Y) = \int y f_Y(y) = \int g(x) f_Y(g(x)) g^\prime(x) \, dy = E(g(X))
\end{displaymath}

by the change of variables formula for integration. This is good because otherwise we might have two different values for E(eX).

In general, there are random variables which are neither absolutely continuous nor discrete. Here's how probabilists define E in general.

Def'n: A random variable X is simple if we can write

\begin{displaymath}X(\omega)= \sum_1^n a_i 1(\omega\in A_i)
\end{displaymath}

for some constants $a_1,\ldots,a_n$ and events Ai.

Def'n: For a simple rv X we define

\begin{displaymath}E(X) = \sum a_i P(A_i)
\end{displaymath}

For positive random variables which are not simple we extend our definition by approximation:

Def'n: If $X \ge 0$ then

\begin{displaymath}E(X) = \sup\{E(Y): 0 \le Y \le X, Y \mbox{ simple}\}
\end{displaymath}

Def'n: We call X integrable if

\begin{displaymath}E(\vert X\vert) < \infty \, .
\end{displaymath}

In this case we define

\begin{displaymath}E(X) = E(\max(X,0)) -E(\max(-X,0))
\end{displaymath}

Facts: E is a linear, monotone, positive operator:

1.
Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.

2.
Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$.

3.
Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n)
\end{displaymath}

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and there is a random variable X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

This is often used with all Yn the same random variable Y.

Fatou's Lemma: If $X_n \ge 0$ then

\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n)
\end{displaymath}

Theorem: With this definition of E if X has density f(x) (even in Rp say) and Y=g(X) then

\begin{displaymath}E(Y) = \int g(x) f(x) dx \, .
\end{displaymath}

(This could be a multiple integral.) If X has pmf f then

\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, .
\end{displaymath}

This works for instance even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an Rp valued random vector X we define $\mu_X = E(X) $ to be the vector whose $i^{\rm th}$ entry is E(Xi) (provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}
\\
&=...
...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty
\\
& = 0
\end{align*}
and (integrating by parts)
\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}
\\ ...
...ty
+ (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}
\end{align*}
so that

\begin{displaymath}\mu_r = (r-1)\mu_{r-2}
\end{displaymath}

for $r \ge 2$. Remembering that $\mu_1=0$ and

\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
\end{displaymath}

we find that

\begin{displaymath}\mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even}
\end{array}\right.
\end{displaymath}

If now $X\sim N(\mu,\sigma^2)$, that is, $X\sim \sigma Z + \mu$, then $E(X) = \sigma E(Z) + \mu = \mu$ and

\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r)
\end{displaymath}

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each Xi is integrable then $X=X_1\cdots X_p$ is integrable and

\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p)
\end{displaymath}

Proof: Suppose each Xi is simple: $X_i = \sum_j x_{ij} 1(X_i
=x_{ij})$ where the xij are the possible values of Xi. Then
\begin{align*}E(X_1\cdots X_p) & = \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p} ...
...ft[\sum_{j_p} x_{pj_p} P(X_p = x_{pj_p})\right]
\\
&= \prod E(X_i)
\end{align*}
For general Xi>0 we create a sequence of simple approximations by rounding Xi down to the nearest multiple of 2-n (to a maximum of n) and applying the case just done and the monotone convergence theorem. The general case uses the fact that we can write each Xi as the difference of its positive and negative parts:

\begin{displaymath}X_i = \max(X_i,0) -\max(-X_i,0)
\end{displaymath}

Moment Generating Functions

Def'n: The moment generating function of a real valued X is

MX(t) = E(etX)

defined for those real t for which the expected value is finite.

Def'n: The moment generating function of $X\in R^p$ is

\begin{displaymath}M_X(u) = E[\exp{u^tX}]
\end{displaymath}

defined for those vectors u for which the expected value is finite.

The mgf has the following formal connection to moments:
\begin{align*}M_X(t) & = \sum_{k=0}^\infty E[(tX)^k]/k!
\\
= \sum_{k=0}^\infty \mu_k^\prime t^k/k!
\end{align*}
It is thus sometimes possible to find the power series expansion of MX and read off the moments of X from the coefficients of the powers tk/k!.

Theorem: If M is finite for all $t \in [-\epsilon,\epsilon]$ for some $\epsilon > 0$ then

1.
Every moment of X is finite.

2.
M is $C^\infty$ (in fact M is analytic).

3.
$\mu_k^\prime = \frac{d^k}{dt^k} M_X(0)$.

The proof, and many other facts about mgfs, rely on techniques of complex variables.


next up previous



Richard Lockhart
1998-09-21