No Title

$next$ $up$ $previous$

Postscript version of this file

STAT 450 Lecture 10

Reading for Today's Lecture: ?

Goals of Today's Lecture:

Discuss Monte Carlo techniques
Introduce statistical inference

Expectation, moments

We give two definitions of expected values:

Def'n If X has density f then

$\begin{displaymath}E(g(X)) = \int g(x)f(x)\, dx \,. \end{displaymath}$

Def'n: If X has discrete density f then

$\begin{displaymath}E(g(X)) = \sum_x g(x)f(x) \,. \end{displaymath}$

Now if Y=g(X) for smooth g and X has density f_X then
$\begin{align*}E(Y) & = \int y f_Y(y) \\ & = \int g(x) f_Y(g(x)) g^\prime(x) \, dx \\ & = \int g(x) f_X(x) \, dx \\ & = E(g(X)) \end{align*}$
by the change of variables formula for integration. This is good because otherwise we might have two different values for E(e^X).

In general, there are random variables which are neither absolutely continuous nor discrete. Look at my STAT 801 web pages to see how $\text{E}$ is defined in general.

Facts: E is a linear, monotone, positive operator:

1.: Linear: E(aX+bY) = aE(X)+bE(Y) provided X and Y are integrable.
2.: Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$ .
3.: Monotone: $P(X \ge Y)=1$ and X, Y integrable implies $E(X) \ge E(Y)$ .

Major technical theorems:

Monotone Convergence: If $0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

$\begin{displaymath}E(X) = \lim_{n\to \infty} E(X_n) \end{displaymath}$

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and there is a random variable X such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable Y such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

$\begin{displaymath}E(X_n) \to E(X) \end{displaymath}$

This is often used with all Y_n the same random variable Y.

Fatou's Lemma: If $X_n \ge 0$ then

$\begin{displaymath}E(\lim\sup X_n) \le \lim\sup E(X_n) \end{displaymath}$

Theorem: With this definition of E if X has density f(x) (even in R^p say) and Y=g(X) then

$\begin{displaymath}E(Y) = \int g(x) f(x) dx \, . \end{displaymath}$

(This could be a multiple integral.) If X has pmf f then

$\begin{displaymath}E(Y) =\sum_x g(x) f(x) \, . \end{displaymath}$

This works for instance even if X has a density but Y doesn't.

Def'n: The $r^{\rm th}$ moment (about the origin) of a real random variable X is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for E(X). The $r^{\rm th}$ central moment is

$\begin{displaymath}\mu_r = E[(X-\mu)^r] \end{displaymath}$

We call $\sigma^2 = \mu_2$ the variance.

Def'n: For an R^p valued random vector X we define $\mu_X = E(X)$ to be the vector whose $i^{\rm th}$ entry is E(X_i)(provided all entries exist).

Def'n: The ( $p \times p$ ) variance covariance matrix of X is

$\begin{displaymath}Var(X) = E\left[ (X-\mu)(X-\mu)^t \right] \end{displaymath}$

which exists provided each component X_i has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems. Here is one version of Markov's inequality (one case is Chebyshev's inequality):
$\begin{align*}P(\vert X-\mu\vert \ge t ) &= E[1(\vert X-\mu\vert \ge t)] \\ &\... ...-\mu\vert \ge t)\right] \\ & \le \frac{E[\vert X-\mu\vert^r]}{t^r} \end{align*}$
The intuition is that if moments are small then large deviations from average are unlikely.

Example moments: If Z is standard normal then
$\begin{align*}E(Z) & = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi} \\ &=... ...ac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty \\ & = 0 \end{align*}$
and (integrating by parts)
$\begin{align*}E(Z^r) &= \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi} \\ ... ...ty + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi} \end{align*}$
so that

$\begin{displaymath}\mu_r = (r-1)\mu_{r-2} \end{displaymath}$

for $r \ge 2$ . Remembering that $\mu_1=0$ and

$\begin{displaymath}\mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1 \end{displaymath}$

we find that

$\begin{displaymath}\mu_r = \left\{ \begin{array}{ll} 0 & \mbox{$r$ odd} \\ (r-1)(r-3)\cdots 1 & \mbox{$r$ even} \end{array}\right. \end{displaymath}$

If now $X\sim N(\mu,\sigma^2)$ , that is, $X\sim \sigma Z + \mu$ , then $E(X) = \sigma E(Z) + \mu = \mu$ and

$\begin{displaymath}\mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r) \end{displaymath}$

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

If $X \sim MVN(\mu, \Sigma)$ then we have $X=AZ+\mu$ with Z standard multivariate normal. Hence
$\begin{align*}\text{E}(X) & = A \text{E}(Z) + \mu \\ & = A \cdot 0 + \mu \\ & = 0 \end{align*}$
Moreover
$\begin{align*}\text{Var}(X) & = \text{E}\left[(X-\mu)(X - \mu)^T \right] \\ & =... ...t{E}\left[ (AZ)(AZ)^T\right] \\ & = A \text{E}\left[ZZ^T\right]A^T \end{align*}$
To compute $\text{E}\left[ZZ^T\right]$ look at entry ij in ZZ^T which is Z_iZ_j. Then
$\begin{align*}\text{E}(Z_iZ_j) & = \begin{cases} \text{E}(Z_i^2) & i=j \\ \text... ...end{cases}\\ & = \begin{cases} 1 & i=j \\ 0 & i \neq j \end{cases}\end{align*}$

SO: $\text{E}(ZZ^T) = I_{n \times n}$ and

$\begin{displaymath}\text{Var}(X) = AA^T = \Sigma \, . \end{displaymath}$

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each X_i is integrable then $X=X_1\cdots X_p$ is integrable and

$\begin{displaymath}E(X_1\cdots X_p) = E(X_1) \cdots E(X_p) \end{displaymath}$

$next$ $up$ $previous$

Richard Lockhart
1999-09-30