web

STAT 801: Mathematical Statistics

Convergence in Distribution

Undergraduate version of central limit theorem: if $X_1,\ldots,X_n$ are iid from a population with mean $\mu$ and standard deviation $\sigma$ then $n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution.

Also Binomial random variable has approximately a distribution.

Precise meaning of statements like `` and have approximately the same distribution''?

Desired meaning: and have nearly the same cdf.

But care needed.

Q1) If is a large number is the distribution close to the distribution of $X\equiv 0$ ?

Q2) Is close to the distribution?

Q3) Is close to $N(1/\sqrt{n},1/n)$ distribution?

Q4) If $X_n\equiv 2^{-n}$ is the distribution of close to that of $X\equiv 0$ ?

Answers depend on how close close needs to be so it's a matter of definition.

In practice the usual sort of approximation we want to make is to say that some random variable , say, has nearly some continuous distribution, like .

So: want to know probabilities like are nearly .

Real difficulty: case of discrete random variables or infinite dimensions: not done in this course.

Mathematicians' meaning of close:

Either they can provide an upper bound on the distance between the two things or they are talking about taking a limit.

In this course we take limits.

Definition: A sequence of random variables converges in distribution to a random variable if

$\displaystyle E(g(X_n)) \to E(g(X))$

for every bounded continuous function

Theorem 1 The following are equivalent:

converges in distribution to .
$P(X_n \le x) \to P(X \le x)$ for each such that
The limit of the characteristic functions of is the characteristic function of :

$\displaystyle E(e^{itX_n}) \to E(e^{itX})$
for every real .

These are all implied by

$\displaystyle M_{X_n}(t) \to M_X(t) < \infty$

for all $\vert t\vert \le \epsilon$ for some positive $\epsilon$ .

Now let's go back to the questions I asked:

$X_n\sim N(0,1/n)$ and . Then

$\displaystyle P(X_n \le x) \to \left\{\begin{array}{ll} 1 & x>0 \\ 0 & x<0 \\ 1/2 & x=0 \end{array}\right.$
Now the limit is the cdf of except for and the cdf of is not continuous at so yes, converges to in distribution.
I asked if $X_n\sim N(1/n,1/n)$ had a distribution close to that of $Y_n \sim N(0,1/n)$ . The definition I gave really requires me to answer by finding a limit and proving that both and converge to in distribution. Take . Then

$\displaystyle E(e^{tX_n}) = e^{t/n+t^2/(2n)} \to 1 = E(e^{tX})$
and

$\displaystyle E(e^{tY_n}) = e^{t^2/(2n)} \to 1$
so that both and have the same limit in distribution.

$\psfig {file=convergence1.ps,height=6in,width=6in}$

$\psfig {file=convergence3.ps,height=6in,width=6in}$

Multiply both and by $n^{1/2}$ and let $X \sim N(0,1)$ . Then $\sqrt{n}X_n \sim N(n^{-1/2},1)$ and $\sqrt{n} Y_n \sim N(0,1)$ . Use characteristic functions to prove that both $\sqrt{n}X_n$ and $\sqrt{n} Y_n$ converge to in distribution.
If you now let $X_n \sim N(n^{-1/2},1/n)$ and $Y_n \sim N(0,1/n)$ then again both and converge to 0 in distribution.
If you multiply and in the previous point by $n^{1/2}$ then $n^{1/2}X_n \sim N(1,1)$ and $n^{1/2} Y_n \sim N(0,1)$ so that $n^{1/2}X_n$ and $n^{1/2} Y_n$ are not close together in distribution.
You can check that $2^{-n}\to 0$ in distribution.

$\psfig {file=convergence5.ps,height=6in,width=6in}$

Summary: to derive approximate distributions:

Show sequence of rvs converges to some .

The limit distribution (i.e. dstbon of ) should be non-trivial, like say .

Don't say: is approximately .

Do say: $n^{1/2}X_n$ converges to in distribution.

The Central Limit Theorem

If $X_1, X_2, \cdots$ are iid with mean 0 and variance 1 then $n^{1/2}\bar{X}$ converges in distribution to . That is,

$\displaystyle P(n^{1/2}\bar{X} \le x ) \to \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} dy \, .$

Proof: As before

$\displaystyle E(e^{itn^{1/2}\bar{X}}) \to e^{-t^2/2}$

This is the characteristic function of a

random variable so we are done by our theorem.

Edgeworth expansions

In fact if $\gamma=E(X^3)$ then

$\displaystyle \phi(t) \approx 1 -t^2/2 -i\gamma t^3/6 + \cdots$

keeping one more term. Then

$\displaystyle \log(\phi(t)) =\log(1+u)$

where

$\displaystyle u=-t^2/2 -i \gamma t^3/6 + \cdots$

Use $\log(1+u) = u-u^2/2 + \cdots$ to get

$\begin{multline*} \log(\phi(t)) \approx \\ [-t^2/2 -i\gamma t^3/6 +\cdots] \\ -[\cdots]^2/2 +\cdots \end{multline*}$

which rearranged is

$\displaystyle \log(\phi(t)) \approx -t^2/2 -i\gamma t^3/6 + \cdots$

Now apply this calculation to

$\displaystyle \log(\phi_T(t)) \approx -t^2/2 -i E(T^3) t^3/6 + \cdots$

Remember $E(T^3) = \gamma/\sqrt{n}$ and exponentiate to get

$\displaystyle \phi_T(t) \approx e^{-t^2/2} \exp\{-i\gamma t^3/(6\sqrt{n}) + \cdots\}$

You can do a Taylor expansion of the second exponential around 0 because of the square root of

and get

$\displaystyle \phi_T(t) \approx e^{-t^2/2} (1-i\gamma t^3/(6\sqrt{n}))$

neglecting higher order terms. This approximation to the characteristic function of

can be inverted to get an Edgeworth approximation to the density (or distribution) of

which looks like

$\displaystyle f_T(x) \approx \frac{1}{\sqrt{2\pi}} e^{-x^2/2} [1-\gamma (x^3-3x)/(6\sqrt{n}) + \cdots]$

Remarks:

The error using the central limit theorem to approximate a density or a probability is proportional to $n^{-1/2}$
This is improved to $n^{-1}$ for symmetric densities for which $\gamma=0$ .
These expansions are asymptotic. This means that the series indicated by $\cdots$ usually does not converge. When it may help to take the second term but get worse if you include the third or fourth or more.
You can integrate the expansion above for the density to get an approximation for the cdf.

Multivariate convergence in distribution

Definition: $X_n\in R^p$ converges in distribution to $X\in R^p$ if

$\displaystyle E(g(X_n)) \to E(g(X))$

for each bounded continuous real valued function

This is equivalent to either of

Cramér Wold Device: converges in distribution to for each $a \in R^p$

Convergence of characteristic functions:

$\displaystyle E(e^{ia^tX_n}) \to E(e^{ia^tX})$

for each $a \in R^p$ .

Extensions of the CLT

$Y_1,Y_2,\cdots$ iid in , mean $\mu$ , variance covariance $\Sigma$ then $n^{1/2}(\bar{Y}-\mu)$ converges in distribution to $MVN(0,\Sigma)$ .
Lyapunov CLT: for each $X_{n1},\ldots,X_{nn}$ independent rvs with

$\displaystyle E(X_{ni})$ $\displaystyle =0$

$\displaystyle Var(\sum_i X_{ni})$ $\displaystyle = 1$

$\displaystyle \sum E(\vert X_{ni}\vert^3) \to 0$

then $\sum_i X_{ni}$ converges to .
Lindeberg CLT: 1st two conds of Lyapunov and

$\displaystyle \sum E(X_{ni}^2 1(\vert X_{ni}\vert > \epsilon)) \to 0$
each $\epsilon > 0$ . Then $\sum_i X_{ni}$ converges in distribution to . (Lyapunov's condition implies Lindeberg's.)
Non-independent rvs: -dependent CLT, martingale CLT, CLT for mixing processes.
Not sums: Slutsky's theorem, $\delta$ method.

Slutsky's Theorem: If converges in distribution to and converges in distribution (or in probability) to , a constant, then converges in distribution to . More generally, if is continuous then $f(X_n,Y_n) \Rightarrow f(X,c)$ .

Warning: the hypothesis that the limit of be constant is essential.

Definition: We say converges to in probability if

$\displaystyle P(\vert Y_n-Y\vert > \epsilon) \to 0$

for each $\epsilon > 0$ .

The fact is that for constant convergence in distribution and in probability are the same. In general convergence in probability implies convergence in distribution. Both of these are weaker than almost sure convergence:

Definition: We say converges to almost surely if

$\displaystyle P(\{\omega\in \Omega: \lim_{n \to \infty} Y_n(\omega) = Y(\omega) \}) = 1 \, .$

The delta method: Suppose:

Sequence of rvs converges to some , a constant.
then converges in distribution to some random variable .
is differentiable ftn on range of .

Then

converges in distribution to $f^\prime(y) X$ .

If $X_n\in R^p$ and $f: R^p\mapsto R^q$ then $f^\prime$ is $q\times p$ matrix of first derivatives of components of .

Example: Suppose $X_1,\ldots,X_n$ are a sample from a population with mean $\mu$ , variance $\sigma^2$ , and third and fourth central moments $\mu_3$ and $\mu_4$ . Then

$\displaystyle n^{1/2}(s^2-\sigma^2) \Rightarrow N(0,\mu_4-\sigma^4)$

where $\Rightarrow$ is notation for convergence in distribution. For simplicity I define $s^2 = \overline{X^2} -{\bar{X}}^2$ .

Take $Y_n =(\overline{X^2},\bar{X})$ . Then converges to $y=(\mu^2+\sigma^2,\mu)$ . Take $a_n = n^{1/2}$ . Then

$\displaystyle n^{1/2}(Y_n-y)$

converges in distribution to $MVN(0,\Sigma)$ with

$\displaystyle \Sigma = \left[\begin{array}{cc} \mu_4-\sigma^4 & \mu_3 -\mu(\mu^2+\sigma^2)\\ \mu_3-\mu(\mu^2+\sigma^2) & \sigma^2 \end{array} \right]$

Define . Then . The gradient of has components . This leads to

$\begin{multline*} n^{1/2}(s^2-\sigma^2) \approx \\ n^{1/2}[1, -2\mu] \left[\b... ...ne{X^2} - (\mu^2 + \sigma^2) \\ \bar{X} -\mu \end{array}\right] \end{multline*}$

which converges in distribution to $(1,-2\mu) Y$ . This rv is $N(0,a^t \Sigma a)=N(0, \mu_4-\sigma^2)$ where $a=(1,-2\mu)^t$ .

Remark: In this sort of problem it is best to learn to recognize that the sample variance is unaffected by subtracting $\mu$ from each . Thus there is no loss in assuming $\mu=0$ which simplifies $\Sigma$ and .