next up previous


Postscript version of this page
PDF version of this page

STAT 450: Statistical Theory

Expectation, moments

Two elementary definitions of expected values:

Defn: If $ X$ has density $ f$ then

$\displaystyle E(g(X)) = \int g(x)f(x)\, dx \,.
$

Defn: If $ X$ has discrete density $ f$ then

$\displaystyle E(g(X)) = \sum_x g(x)f(x) \,.
$

FACT: If $ Y=g(X)$ for a smooth $ g$

$\displaystyle E(Y)$ $\displaystyle = \int y f_Y(y) \, dy$    
  $\displaystyle = \int g(x) f_Y(g(x)) g^\prime(x) \, dy$    
  $\displaystyle = E(g(X))$    

by the change of variables formula for integration. This is good because otherwise we might have two different values for $ E(e^X)$.

In general, there are random variables which are neither absolutely continuous nor discrete. See STAT 801 for general definition of E.

Defn: We call $ X$ integrable if

$\displaystyle E(\vert X\vert) < \infty \, .
$

Facts: $ E$ is a linear, monotone, positive operator:

  1. Linear: $ E(aX+bY) = aE(X)+bE(Y)$ provided $ X$ and $ Y$ are integrable.

  2. Positive: $ P(X \ge 0) = 1$ implies $ E(X) \ge 0$.

  3. Monotone: $ P(X \ge Y)=1$ and $ X$, $ Y$ integrable implies $ E(X) \ge E(Y)$.

Major technical theorems:

Monotone Convergence: If $ 0 \le X_1 \le X_2 \le \cdots$ and $ X= \lim X_n$ (which has to exist) then

$\displaystyle E(X) = \lim_{n\to \infty} E(X_n) \, .
$

Dominated Convergence: If $ \vert X_n\vert \le Y_n$ and $ \exists$ rv $ X$ such that $ X_n \to X$ (technical details of this convergence later in the course) and a random variable $ Y$ such that $ Y_n \to Y$ with $ E(Y_n) \to E(Y) < \infty$ then

$\displaystyle E(X_n) \to E(X) \, .
$

Often used with all $ Y_n$ the same rv $ Y$.

Theorem: With this definition of $ E$ if $ X$ has density $ f(x)$ (even in $ R^p$ say) and $ Y=g(X)$ then

$\displaystyle E(Y) = \int g(x) f(x) dx \, .
$

(Could be a multiple integral.) If $ X$ has pmf $ f$ then

$\displaystyle E(Y) =\sum_x g(x) f(x) \, .
$

Firts conclusion works, e.g., even if $ X$ has a density but $ Y$ doesn't.

Defn: The $ r^{\rm th}$ moment (about the origin) of a real rv $ X$ is $ \mu_r^\prime=E(X^r)$ (provided it exists). We generally use $ \mu$ for $ E(X)$.

Defn: The $ r^{\rm th}$ central moment is

$\displaystyle \mu_r = E[(X-\mu)^r] \, .
$

We call $ \sigma^2 = \mu_2$ the variance.

Defn: For an $ R^p$ valued random vector $ X$

$\displaystyle \mu_X = E(X)
$

is vector whose $ i^{\rm th}$ entry is $ E(X_i)$ (provided all entries exist).

Defn: The ( $ p \times p$) variance covariance matrix of $ X$ is

$\displaystyle Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]
$

which exists provided each component $ X_i$ has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems.

Example: Markov's inequality

$\displaystyle P(\vert X-\mu\vert \ge t )$ $\displaystyle = E[1(\vert X-\mu\vert \ge t)]$    
  $\displaystyle \le E\left[\frac{\vert X-\mu\vert^r}{t^r}1(\vert X-\mu\vert \ge t)\right]$    
  $\displaystyle \le \frac{E[\vert X-\mu\vert^r]}{t^r}$    

Intuition: if moments are small then large deviations from average are unlikely.

Special Case: Chebyshev's inequality

$\displaystyle P(\vert X-\mu\vert \ge t ) \le \frac{{\rm Var}(X)}{t^2} \, .
$

Example moments: If $ Z$ is standard normal then

$\displaystyle E(Z)$ $\displaystyle = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}$    
  $\displaystyle = \left.\frac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle = 0$    

and (integrating by parts)

$\displaystyle E(Z^r) =$ $\displaystyle \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}$    
$\displaystyle =$ $\displaystyle \left.\frac{-z^{r-1}e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$    
  $\displaystyle + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}$    

so that

$\displaystyle \mu_r = (r-1)\mu_{r-2}
$

for $ r \ge 2$. Remembering that $ \mu_1=0$ and

$\displaystyle \mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1
$

we find that

$\displaystyle \mu_r = \left\{ \begin{array}{ll}
0 & \mbox{$r$ odd}
\\
(r-1)(r-3)\cdots 1 & \mbox{$r$ even} \, .
\end{array}\right.
$

If now $ X\sim N(\mu,\sigma^2)$, that is, $ X\sim \sigma Z + \mu$, then $ E(X) = \sigma E(Z) + \mu = \mu$ and

$\displaystyle \mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r) \, .
$

In particular, we see that our choice of notation $ N(\mu,\sigma^2)$ for the distribution of $ \sigma Z + \mu$ is justified; $ \sigma$ is indeed the variance.

Similarly for $ X=\sim MVN(\mu,\Sigma)$ we have $ X=AZ+\mu$ with $ Z\sim MVN(0,I)$ and

$\displaystyle E(X) = \mu
$

and

$\displaystyle {\rm Var}(X)$ $\displaystyle = E\left\{(X-\mu)(X-\mu)^t\right\}$    
  $\displaystyle = E\left\{ AZ (AZ)^t\right\}$    
  $\displaystyle = A E(ZZ^t) A^t$    
  $\displaystyle = AIA^t = \Sigma \, .$    

Note use of easy calculation: $ E(Z)=0$ and

$\displaystyle {\rm Var}(Z) = E(ZZ^t) =I \, .
$

Moments and independence

Theorem: If $ X_1,\ldots,X_p$ are independent and each $ X_i$ is integrable then $ X=X_1\cdots X_p$ is integrable and

$\displaystyle E(X_1\cdots X_p) = E(X_1) \cdots E(X_p) \, .
$

next up previous



Richard Lockhart
2002-09-16