next up previous


Postscript version of this file

STAT 450 Lecture 10

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Expectation, moments

We give two definitions of expected values:

Def'n If X has density f then

\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,.
\end{displaymath}

Def'n: If X has discrete density f then

\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,.
\end{displaymath}

Now if Y=g(X) for smooth g and X has density fX then
\begin{align*}E(Y) & = \int y f_Y(y)
\\
& = \int g(x) f_Y(g(x)) g^\prime(x) \, dx
\\
& = \int g(x) f_X(x) \, dx
\\
& = E(g(X))
\end{align*}
by the change of variables formula for integration. This is good because otherwise we might have two different values for E(eX).

In general, there are random variables which are neither absolutely continuous nor discrete. Look at my STAT 801 web pages to see how $\text{E}$ is defined in general.

Facts: E is a linear, monotone, positive operator:

1.
Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.

2.
Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$.

3.
Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n)
\end{displaymath}

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and there is a random variable X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

\begin{displaymath}E(X_n) \to E(X)
\end{displaymath}

This is often used with all Yn the same random variable Y.

Fatou's Lemma: If $X_n \ge 0$ then

\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n)
\end{displaymath}

Theorem: With this definition of E if X has density f(x) (even in Rp say) and Y=g(X) then

\begin{displaymath}E(Y) = \int g(x) f(x) dx \, .
\end{displaymath}

(This could be a multiple integral.) If X has pmf f then

\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, .
\end{displaymath}

This works for instance even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

\begin{displaymath}\mu_r = E[(X-\mu)^r]
\end{displaymath}

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an Rp valued random vector X we define $\mu_X = E(X) $ to be the vector whose $i^{\rm th}$ entry is E(Xi)(provided all entries exist).

Def'n: The ( $p \times p$) variance covariance matrix of X is

\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
\end{displaymath}

which exists provided each component Xi has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)]
\\
&\...
...-\mu\vert \ge t)\right]
\\
& \le \frac{E[\vert X-\mu\vert^r]}{t^r}
\end{align*}
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}
\\
&=...
...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty
\\
& = 0
\end{align*}
and (integrating by parts)
\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}
\\ ...
...ty
+ (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}
\end{align*}
so that

\begin{displaymath}\mu_r = (r-1)\mu_{r-2}
\end{displaymath}

for $r \ge 2$. Remembering that $\mu_1=0$ and

\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
\end{displaymath}

we find that

\begin{displaymath}\mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even}
\end{array}\right.
\end{displaymath}

If now $X\sim N(\mu,\sigma^2)$, that is, $X\sim \sigma Z + \mu$, then $E(X) = \sigma E(Z) + \mu = \mu$and

\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r)
\end{displaymath}

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

If $X \sim MVN(\mu, \Sigma)$ then we have $X=AZ+\mu$ with Z standard multivariate normal. Hence
\begin{align*}\text{E}(X) & = A \text{E}(Z) + \mu
\\
& = A \cdot 0 + \mu
\\
& = 0
\end{align*}
Moreover
\begin{align*}\text{Var}(X) & = \text{E}\left[(X-\mu)(X - \mu)^T \right]
\\
& =...
...t{E}\left[ (AZ)(AZ)^T\right]
\\
& = A \text{E}\left[ZZ^T\right]A^T
\end{align*}
To compute $\text{E}\left[ZZ^T\right]$ look at entry ij in ZZT which is ZiZj. Then
\begin{align*}\text{E}(Z_iZ_j) & = \begin{cases}
\text{E}(Z_i^2) & i=j
\\
\text...
...end{cases}\\
& = \begin{cases}
1 & i=j
\\
0 & i \neq j
\end{cases}\end{align*}

SO: $\text{E}(ZZ^T) = I_{n \times n}$ and

\begin{displaymath}\text{Var}(X) = AA^T = \Sigma \, .
\end{displaymath}

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each Xi is integrable then $X=X_1\cdots X_p$ is integrable and

\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p)
\end{displaymath}


next up previous



Richard Lockhart
1999-09-30