next up previous


Postscript version of this page

STAT 801: Mathematical Statistics

Expectation, moments

Two elementary definitions of expected values:

Defn: If $ X$ has density $ f$ then

$\displaystyle E(g(X)) = \int g(x)f(x)\, dx \,.
$

Defn: If $ X$ has discrete density $ f$ then

$\displaystyle E(g(X)) = \sum_x g(x)f(x) \,.
$

FACT: If $ Y=g(X)$ for a smooth $ g$

$\displaystyle E(Y)$ $\displaystyle = \int y f_Y(y) \, dy$    
  $\displaystyle = \int g(x) f_Y(g(x)) g^\prime(x) \, dy$    
  $\displaystyle = E(g(X))$    

by the change of variables formula for integration. This is good because otherwise we might have two different values for $ E(e^X)$.

In general, there are random variables which are neither absolutely continuous nor discrete. Here's how probabilists define $ E$ in general.

Defn: RV $ X$ is simple if we can write

$\displaystyle X(\omega)= \sum_1^n a_i 1(\omega\in A_i)
$

for some constants $ a_1,\ldots,a_n$ and events $ A_i$.

Defn: For a simple rv $ X$ define

$\displaystyle E(X) = \sum a_i P(A_i)
$

For positive random variables which are not simple we extend our definition by approximation:

Defn: If $ X \ge 0$ then

$\displaystyle E(X) = \sup\{E(Y): 0 \le Y \le X, Y$    simple$\displaystyle \}
$

Defn: We call $ X$ integrable if

$\displaystyle E(\vert X\vert) < \infty \, .
$

In this case we define

$\displaystyle E(X) = E(\max(X,0)) -E(\max(-X,0)) \, .
$

Facts: $ E$ is a linear, monotone, positive operator:

  1. Linear: $ E(aX+bY) = aE(X)+bE(Y)$ provided $ X$ and $ Y$ are integrable.

  2. Positive: $ P(X \ge 0) = 1$ implies $ E(X) \ge 0$.

  3. Monotone: $ P(X \ge Y)=1$ and $ X$, $ Y$ integrable implies $ E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $ X= \lim X_n$ (which has to exist) then

$\displaystyle E(X) = \lim_{n\to \infty} E(X_n) \, .
$

Dominated Convergence: If $ \vert X_n\vert \le Y_n$ and $ \exists$ rv $ X$ such that $ X_n \to X$ (technical details of this convergence later in the course) and a random variable $ Y$ such that $ Y_n \to Y$ with $ E(Y_n) \to E(Y) < \infty$ then

$\displaystyle E(X_n) \to E(X) \, .
$

Often used with all $ Y_n$ the same rv $ Y$.

Fatou's Lemma: If $ X_n \ge 0$ then

$\displaystyle E(\lim\sup X_n) \le \lim\sup E(X_n) \, .
$

Theorem: With this definition of $ E$ if $ X$ has density $ f(x)$ (even in $ R^p$ say) and $ Y=g(X)$ then

$\displaystyle E(Y) = \int g(x) f(x) dx \, .
$

(Could be a multiple integral.) If $ X$ has pmf $ f$ then

$\displaystyle E(Y) =\sum_x g(x) f(x) \, .
$

Firts conclusion works, e.g., even if $ X$ has a density but $ Y$ doesn't.

Defn: The $ r^{\rm th}$ moment (about the origin) of a real rv $ X$ is $ \mu_r^\prime=E(X^r)$ (provided it exists). We generally use $ \mu$ for $ E(X)$.

Defn: The $ r^{\rm th}$ central moment is

$\displaystyle \mu_r = E[(X-\mu)^r] \, .
$

We call $ \sigma^2 = \mu_2$ the variance.

Defn: For an $ R^p$ valued random vector $ X$

$\displaystyle \mu_X = E(X)
$

is vector whose $ i^{\rm th}$ entry is $ E(X_i)$ (provided all entries exist).

Defn: The ( $ p \times p$) variance covariance matrix of $ X$ is

$\displaystyle Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
$

which exists provided each component $ X_i$ has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems.

Example: Markov's inequality

$\displaystyle P(\vert X-\mu\vert \ge t )$ $\displaystyle = E[1(\vert X-\mu\vert \ge t)]$    
  $\displaystyle \le E\left[\frac{\vert X-\mu\vert^r}{t^r}1(\vert X-\mu\vert \ge t)\right]$    
  $\displaystyle \le \frac{E[\vert X-\mu\vert^r]}{t^r}$    

Intuition: if moments are small then large deviations from average are unlikely.

Special Case: Chebyshev's inequality

$\displaystyle P(\vert X-\mu\vert \ge t ) \le \frac{{\rm Var}(X)}{t^2} \, .
$

Example moments: If $ Z$ is standard normal then

$\displaystyle E(Z)$ $\displaystyle = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}$    
  $\displaystyle = \left.\frac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle = 0$    

and (integrating by parts)

$\displaystyle E(Z^r) =$ $\displaystyle \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}$    
$\displaystyle =$ $\displaystyle \left.\frac{-z^{r-1}e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}$    

so that

$\displaystyle \mu_r = (r-1)\mu_{r-2}
$

for $ r \ge 2$. Remembering that $ \mu_1=0$ and

$\displaystyle \mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
$

we find that

$\displaystyle \mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even} \, .
\end{array}\right.
$

If now $ X\sim N(\mu,\sigma^2)$, that is, $ X\sim \sigma Z + \mu$, then $ E(X) = \sigma E(Z) + \mu = \mu$ and

$\displaystyle \mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r) \, .
$

In particular, we see that our choice of notation $ N(\mu,\sigma^2)$ for the distribution of $ \sigma Z + \mu$ is justified; $ \sigma$ is indeed the variance.

Similarly for $ X=\sim MVN(\mu,\Sigma)$ we have $ X=AZ+\mu$ with $ Z\sim MVN(0,I)$ and

$\displaystyle E(X) = \mu
$

and

$\displaystyle {\rm Var}(X)$ $\displaystyle = E\left\{(X-\mu)(X-\mu)^t\right\}$    
  $\displaystyle = E\left\{ AZ (AZ)^t\right\}$    
  $\displaystyle = A E(ZZ^t) A^t$    
  $\displaystyle = AIA^t = \Sigma \, .$    

Note use of easy calculation: $ E(Z)=0$ and

$\displaystyle {\rm Var}(Z) = E(ZZ^t) =I \, .
$

Moments and independence

Theorem: If $ X_1,\ldots,X_p$ are independent and each $ X_i$ is integrable then $ X=X_1\cdots X_p$ is integrable and

$\displaystyle E(X_1\cdots X_p) = E(X_1) \cdots E(X_p) \, .
$

Proof: Suppose each $ X_i$ is simple:

$\displaystyle X_i = \sum_j x_{ij} 1(X_i =x_{ij})$

where the $ x_{ij}$ are the possible values of $ X_i$. Then

$\displaystyle E(X_1$ $\displaystyle \cdots X_p)$    
$\displaystyle =$ $\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$    
  $\displaystyle \times E(1(X_1=x_{1j_1}) \cdots 1(X_p = x_{pj_p}))$    
$\displaystyle =$ $\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$    
  $\displaystyle \times P(X_1=x_{1j_1}\cdots X_p = x_{pj_p})$    
$\displaystyle =$ $\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$    
  $\displaystyle \times P(X_1=x_{1j_1}) \cdots P(X_p = x_{pj_p})$    
$\displaystyle =$ $\displaystyle \sum_{j_1} x_{1j_1} P(X_1=x_{1j_1}) \times \cdots$    
  $\displaystyle \times \sum_{j_p} x_{pj_p} P(X_p = x_{pj_p})$    
$\displaystyle =$ $\displaystyle \prod E(X_i) \, .$    

General $ X_i>0$:

Let $ X_{in}$ be $ X_i$ rounded down to nearest multiple of $ 2^{-n}$ (to maximum of $ n$).

That is: if

$\displaystyle \frac{k}{2^n} \le X_i < \frac{k+1}{2^n}
$

then $ X_{in} = k/2^n$ for $ k=0,\ldots,n2^n$. For $ X_i>n$ put$ X_{in}=n$.

Apply case just done:

$\displaystyle E(\prod X_{in}) = \prod E(X_{in}) \, .
$

Monotone convergence applies to both sides.

For general case write each $ X_i$ as difference of positive and negative parts:

$\displaystyle X_i = \max(X_i,0) -\max(-X_i,0) \, .
$

Apply positive case.

next up previous



Richard Lockhart
2001-01-20