next up previous


Postscript or PDF version of this page
Reading: Chapter 5 section 5

STAT 450: Statistical Theory

Convergence in Distribution

NOTE: This material may be covered later in the course. The next section of the course is labelled Inference.

STAT 270 version of central limit theorem: if $ X_1,\ldots,X_n$ are iid from a population with mean $ \mu$ and standard deviation $ \sigma$ then $ n^{1/2}(\bar{X}-\mu)/\sigma$ has approximately a normal distribution.

Also Binomial$ (n,p)$ random variable has approximately a $ N(np,np(1-p))$ distribution.

Precise meaning of statements like ``$ X$ and $ Y$ have approximately the same distribution''?

Desired meaning: $ X$ and $ Y$ have nearly the same cdf.

But care needed.

Q1) If $ n$ is a large number is the $ N(0,1/n)$ distribution close to the distribution of $ X\equiv 0$?

Q2) Is $ N(0,1/n)$ close to the $ N(1/n,1/n)$ distribution?

Q3) Is $ N(0,1/n)$ close to $ N(1/\sqrt{n},1/n)$ distribution?

Q4) If $ X_n\equiv 2^{-n}$ is the distribution of $ X_n$ close to that of $ X\equiv 0$?

Answers depend on how close close needs to be so it's a matter of definition.

In practice the usual sort of approximation we want to make is to say that some random variable $ X$, say, has nearly some continuous distribution, like $ N(0,1)$.

So: want to know probabilities like $ P(X>x)$ are nearly $ P(N(0,1) > x)$.

Real difficulty: case of discrete random variables or infinite dimensions: not done in this course.

Mathematicians' meaning of close:

Either they can provide an upper bound on the distance between the two things or they are talking about taking a limit.

In this course we take limits.

Definition: A sequence of random variables $ X_n$ converges in distribution to a random variable $ X$ if

$\displaystyle E(g(X_n)) \to E(g(X))
$

for every bounded continuous function $ g$.

Theorem 1   The following are equivalent:
  1. $ X_n$ converges in distribution to $ X$.
  2. $ P(X_n \le x) \to P(X \le x)$ for each $ x$ such that $ P(X=x)=0$
  3. The limit of the characteristic functions of $ X_n$ is the characteristic function of $ X$:

    $\displaystyle E(e^{itX_n}) \to E(e^{itX})
$

    for every real $ t$.
These are all implied by

$\displaystyle M_{X_n}(t) \to M_X(t) < \infty
$

for all $ \vert t\vert \le \epsilon$ for some positive $ \epsilon$.

Now let's go back to the questions I asked:

\psfig {file=convergence1.ps,height=6in,width=6in}


\psfig {file=convergence3.ps,height=6in,width=6in}


\psfig {file=convergence5.ps,height=6in,width=6in}


Summary: to derive approximate distributions:

Show sequence of rvs $ X_n$ converges to some $ X$.

The limit distribution (i.e. dstbon of $ X$) should be non-trivial, like say $ N(0,1)$.

Don't say: $ X_n$ is approximately $ N(1/n,1/n)$.

Do say: $ n^{1/2}X_n$ converges to $ N(0,1)$ in distribution.

The Central Limit Theorem

If $ X_1, X_2, \cdots$ are iid with mean 0 and variance 1 then $ n^{1/2}\bar{X}$ converges in distribution to $ N(0,1)$. That is,

$\displaystyle P(n^{1/2}\bar{X} \le x ) \to \frac{1}{\sqrt{2\pi}} \int_{-\infty}^x e^{-y^2/2} dy
\, .
$

Proof: As before

$\displaystyle E(e^{itn^{1/2}\bar{X}}) \to e^{-t^2/2}
$

This is the characteristic function of a $ N(0,1)$ random variable so we are done by our theorem.

Multivariate convergence in distribution

Definition: $ X_n\in R^p$ converges in distribution to $ X\in R^p$ if

$\displaystyle E(g(X_n)) \to E(g(X))
$

for each bounded continuous real valued function $ g$ on $ R^p$.

This is equivalent to either of

Cramér Wold Device: $ a^tX_n$ converges in distribution to $ a^t X$ for each $ a \in R^p$

or

Convergence of characteristic functions:

$\displaystyle E(e^{ia^tX_n}) \to E(e^{ia^tX})
$

for each $ a \in R^p$.

Extensions of the CLT


  1. $ Y_1,Y_2,\cdots$ iid in $ R^p$, mean $ \mu$, variance covariance $ \Sigma$ then $ n^{1/2}(\bar{Y}-\mu) $ converges in distribution to $ MVN(0,\Sigma)$.


  2. Lyapunov CLT: for each $ n$ $ X_{n1},\ldots,X_{nn}$ independent rvs with

    $\displaystyle E(X_{ni})$ $\displaystyle =0$    
    $\displaystyle Var(\sum_i X_{ni})$ $\displaystyle = 1$    
    $\displaystyle \sum E(\vert X_{ni}\vert^3) \to 0$    

    then $ \sum_i X_{ni}$ converges to $ N(0,1)$.


  3. Lindeberg CLT: 1st two conds of Lyapunov and

    $\displaystyle \sum E(X_{ni}^2 1(\vert X_{ni}\vert > \epsilon)) \to 0
$

    each $ \epsilon > 0$. Then $ \sum_i X_{ni}$ converges in distribution to $ N(0,1)$. (Lyapunov's condition implies Lindeberg's.)


  4. Not sums: Slutsky's theorem, $ \delta$ method.

Slutsky's Theorem: If $ X_n$ converges in distribution to $ X$ and $ Y_n$ converges in distribution (or in probability) to $ c$, a constant, then $ X_n+Y_n$ converges in distribution to $ X+c$. More generally, if $ f(x,y)$ is continuous then $ f(X_n,Y_n) \Rightarrow f(X,c)$.

Warning: the hypothesis that the limit of $ Y_n$ be constant is essential.

Definition: We say $ Y_n$ converges to $ Y$ in probability if

$\displaystyle P(\vert Y_n-Y\vert > \epsilon) \to 0
$

for each $ \epsilon > 0$.

The fact is that for $ Y$ constant convergence in distribution and in probability are the same. In general convergence in probability implies convergence in distribution. Both of these are weaker than almost sure convergence:

Definition: We say $ Y_n$ converges to $ Y$ almost surely if

$\displaystyle P(\{\omega\in \Omega: \lim_{n \to \infty} Y_n(\omega) = Y(\omega) \}) = 1
\, .
$

The delta method: Suppose:

Then $ a_n(f(Y_n)-f(y))$ converges in distribution to $ f^\prime(y) X$.

If $ X_n\in R^p$ and $ f: R^p\mapsto R^q$ then $ f^\prime$ is $ q\times p$ matrix of first derivatives of components of $ f$.

Example: Suppose $ X_1,\ldots,X_n$ are a sample from a population with mean $ \mu$, variance $ \sigma^2$, and third and fourth central moments $ \mu_3$ and $ \mu_4$. Then

$\displaystyle n^{1/2}(s^2-\sigma^2) \Rightarrow N(0,\mu_4-\sigma^4)
$

where $ \Rightarrow $ is notation for convergence in distribution. For simplicity I define $ s^2 = \overline{X^2} -{\bar{X}}^2$.

Take $ Y_n =(\overline{X^2},\bar{X})$. Then $ Y_n$ converges to $ y=(\mu^2+\sigma^2,\mu)$. Take $ a_n = n^{1/2}$. Then

$\displaystyle n^{1/2}(Y_n-y)
$

converges in distribution to $ MVN(0,\Sigma)$ with

$\displaystyle \Sigma = \left[\begin{array}{cc} \mu_4-\sigma^4 & \mu_3 -\mu(\mu^2+\sigma^2)\\
\mu_3-\mu(\mu^2+\sigma^2) & \sigma^2 \end{array} \right]
$

Define $ f(x_1,x_2) = x_1-x_2^2$. Then $ s^2 = f(Y_n)$. The gradient of $ f$ has components $ (1,-2x_2)$. This leads to

\begin{multline*}
n^{1/2}(s^2-\sigma^2) \approx
\\
n^{1/2}[1, -2\mu]
\left[\b...
...ne{X^2} -
(\mu^2 + \sigma^2)
\\
\bar{X} -\mu
\end{array}\right]
\end{multline*}

which converges in distribution to $ (1,-2\mu) Y$. This rv is $ N(0,a^t \Sigma a)=N(0, \mu_4-\sigma^2)$ where $ a=(1,-2\mu)^t$.

Remark: In this sort of problem it is best to learn to recognize that the sample variance is unaffected by subtracting $ \mu$ from each $ X$. Thus there is no loss in assuming $ \mu=0$ which simplifies $ \Sigma$ and $ a$.

Special case: if the observations are $ N(\mu,\sigma^2)$ then $ \mu_3 =0$ and $ \mu_4=3\sigma^4$. Our calculation has

$\displaystyle n^{1/2} (s^2-\sigma^2) \Rightarrow N(0,2\sigma^4)
$

You can divide through by $ \sigma^2$ and get

$\displaystyle n^{1/2}(\frac{s^2}{\sigma^2}-1) \Rightarrow N(0,2)
$

In fact $ (n-1)s^2/\sigma^2$ has a $ \chi_{n-1}^2$ distribution and so the usual central limit theorem shows that

$\displaystyle (n-1)^{-1/2} [(n-1)s^2/\sigma^2 - (n-1)] \Rightarrow N(0,2)
$

(using mean of $ \chi^2_1$ is 1 and variance is 2). Factoring out $ n-1$ gives the assertion that

$\displaystyle (n-1)^{1/2}(s^2/\sigma^2-1) \Rightarrow N(0,2)
$

which is our $ \delta$ method calculation except for using $ n-1$ instead of $ n$. This difference is unimportant as can be checked using Slutsky's theorem.

next up previous



Richard Lockhart
2002-10-06