web

$next$ $up$ $previous$

Postscript version of this page

STAT 801: Mathematical Statistics

Expectation, moments

Two elementary definitions of expected values:

Defn: If has density then

$\displaystyle E(g(X)) = \int g(x)f(x)\, dx \,.$

Defn: If has discrete density then

$\displaystyle E(g(X)) = \sum_x g(x)f(x) \,.$

FACT: If for a smooth

$\displaystyle E(Y)$	$\displaystyle = \int y f_Y(y) \, dy$
	$\displaystyle = \int g(x) f_Y(g(x)) g^\prime(x) \, dy$
	$\displaystyle = E(g(X))$

by the change of variables formula for integration. This is good because otherwise we might have two different values for

In general, there are random variables which are neither absolutely continuous nor discrete. Here's how probabilists define in general.

Defn: RV is simple if we can write

$\displaystyle X(\omega)= \sum_1^n a_i 1(\omega\in A_i)$

for some constants $a_1,\ldots,a_n$ and events

Defn: For a simple rv define

$\displaystyle E(X) = \sum a_i P(A_i)$

For positive random variables which are not simple we extend our definition by approximation:

Defn: If $X \ge 0$ then

$\displaystyle E(X) = \sup\{E(Y): 0 \le Y \le X, Y$ simple $\displaystyle \}$

Defn: We call integrable if

$\displaystyle E(\vert X\vert) < \infty \, .$

In this case we define

$\displaystyle E(X) = E(\max(X,0)) -E(\max(-X,0)) \, .$

Facts: is a linear, monotone, positive operator:

Linear: provided and are integrable.
Positive: $P(X \ge 0) = 1$ implies $E(X) \ge 0$ .
Monotone: $P(X \ge Y)=1$ and , integrable implies $E(X) \ge E(Y)$ .

Major technical theorems:

Monotone Convergence: If $0 \le X_1 \le X_2 \le \cdots$ and $X= \lim X_n$ (which has to exist) then

$\displaystyle E(X) = \lim_{n\to \infty} E(X_n) \, .$

Dominated Convergence: If $\vert X_n\vert \le Y_n$ and $\exists$ rv such that $X_n \to X$ (technical details of this convergence later in the course) and a random variable such that $Y_n \to Y$ with $E(Y_n) \to E(Y) < \infty$ then

$\displaystyle E(X_n) \to E(X) \, .$

Often used with all

the same rv

Fatou's Lemma: If $X_n \ge 0$ then

$\displaystyle E(\lim\sup X_n) \le \lim\sup E(X_n) \, .$

Theorem: With this definition of if has density (even in say) and then

$\displaystyle E(Y) = \int g(x) f(x) dx \, .$

(Could be a multiple integral.) If

has pmf

then

$\displaystyle E(Y) =\sum_x g(x) f(x) \, .$

Firts conclusion works, e.g., even if has a density but doesn't.

Defn: The $r^{\rm th}$ moment (about the origin) of a real rv is $\mu_r^\prime=E(X^r)$ (provided it exists). We generally use $\mu$ for .

Defn: The $r^{\rm th}$ central moment is

$\displaystyle \mu_r = E[(X-\mu)^r] \, .$

We call $\sigma^2 = \mu_2$ the variance.

Defn: For an valued random vector

$\displaystyle \mu_X = E(X)$

is vector whose $i^{\rm th}$ entry is

(provided all entries exist).

Defn: The ( $p \times p$ ) variance covariance matrix of is

$\displaystyle Var(X) = E\left[ (X-\mu)(X-\mu)^t \right]$

which exists provided each component

has a finite second moment.

Moments and probabilities of rare events are closely connected as will be seen in a number of important probability theorems.

Example: Markov's inequality

$\displaystyle P(\vert X-\mu\vert \ge t )$	$\displaystyle = E[1(\vert X-\mu\vert \ge t)]$
	$\displaystyle \le E\left[\frac{\vert X-\mu\vert^r}{t^r}1(\vert X-\mu\vert \ge t)\right]$
	$\displaystyle \le \frac{E[\vert X-\mu\vert^r]}{t^r}$

Intuition: if moments are small then large deviations from average are unlikely.

Special Case: Chebyshev's inequality

$\displaystyle P(\vert X-\mu\vert \ge t ) \le \frac{{\rm Var}(X)}{t^2} \, .$

Example moments: If is standard normal then

$\displaystyle E(Z)$	$\displaystyle = \int_{-\infty}^\infty z e^{-z^2/2} dz /\sqrt{2\pi}$
	$\displaystyle = \left.\frac{-e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$
	$\displaystyle = 0$

and (integrating by parts)

$\displaystyle E(Z^r) =$	$\displaystyle \int_{-\infty}^\infty z^r e^{-z^2/2} dz /\sqrt{2\pi}$
$\displaystyle =$	$\displaystyle \left.\frac{-z^{r-1}e^{-z^2/2}}{\sqrt{2\pi}}\right\vert _{-\infty}^\infty$
	$\displaystyle + (r-1) \int_{-\infty}^\infty z^{r-2} e^{-z^2/2} dz /\sqrt{2\pi}$

so that

$\displaystyle \mu_r = (r-1)\mu_{r-2}$

for $r \ge 2$ . Remembering that $\mu_1=0$ and

$\displaystyle \mu_0 = \int_{-\infty}^\infty z^0 e^{-z^2/2} dz /\sqrt{2\pi}=1$

we find that

$\displaystyle \mu_r = \left\{ \begin{array}{ll} 0 & \mbox{$r$ odd} \\ (r-1)(r-3)\cdots 1 & \mbox{$r$ even} \, . \end{array}\right.$

If now $X\sim N(\mu,\sigma^2)$ , that is, $X\sim \sigma Z + \mu$ , then $E(X) = \sigma E(Z) + \mu = \mu$ and

$\displaystyle \mu_r(X) = E[(X-\mu)^r] = \sigma^r E(Z^r) \, .$

In particular, we see that our choice of notation $N(\mu,\sigma^2)$ for the distribution of $\sigma Z + \mu$ is justified; $\sigma$ is indeed the variance.

Similarly for $X=\sim MVN(\mu,\Sigma)$ we have $X=AZ+\mu$ with $Z\sim MVN(0,I)$ and

$\displaystyle E(X) = \mu$

and

$\displaystyle {\rm Var}(X)$	$\displaystyle = E\left\{(X-\mu)(X-\mu)^t\right\}$
	$\displaystyle = E\left\{ AZ (AZ)^t\right\}$
	$\displaystyle = A E(ZZ^t) A^t$
	$\displaystyle = AIA^t = \Sigma \, .$

Note use of easy calculation:

and

$\displaystyle {\rm Var}(Z) = E(ZZ^t) =I \, .$

Moments and independence

Theorem: If $X_1,\ldots,X_p$ are independent and each is integrable then $X=X_1\cdots X_p$ is integrable and

$\displaystyle E(X_1\cdots X_p) = E(X_1) \cdots E(X_p) \, .$

Proof: Suppose each is simple:

$\displaystyle X_i = \sum_j x_{ij} 1(X_i =x_{ij})$

where the $x_{ij}$ are the possible values of

. Then

$\displaystyle E(X_1$	$\displaystyle \cdots X_p)$
$\displaystyle =$	$\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$
	$\displaystyle \times E(1(X_1=x_{1j_1}) \cdots 1(X_p = x_{pj_p}))$
$\displaystyle =$	$\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$
	$\displaystyle \times P(X_1=x_{1j_1}\cdots X_p = x_{pj_p})$
$\displaystyle =$	$\displaystyle \sum_{j_1\ldots j_p} x_{1j_1}\cdots x_{pj_p}$
	$\displaystyle \times P(X_1=x_{1j_1}) \cdots P(X_p = x_{pj_p})$
$\displaystyle =$	$\displaystyle \sum_{j_1} x_{1j_1} P(X_1=x_{1j_1}) \times \cdots$
	$\displaystyle \times \sum_{j_p} x_{pj_p} P(X_p = x_{pj_p})$
$\displaystyle =$	$\displaystyle \prod E(X_i) \, .$