Postscript version of this file

**Reading for Today's Lecture**: ?

**Goals of Today's Lecture**:

- State Slutsky's Theorem.
- Introduce the
method.
- Discuss Monte Carlo techniques

**Central Limit Theorems**

- 1.
- If
are iid in
*R*^{p}with mean and then converges in distribution to . - 2.
- If for each
*n*we have a set of independent mean 0 random variables and*E*(*X*_{ni}) =0 and and

then converges in distribution, as , to standard normal. This is the Lyapunov central limit theorem. - 3.
- As in the Lyapunov CLT but replace the third moment condition
with

for each then again converges in distribution to*N*(0,1). This is the Lindeberg central limit theorem. (Lyapunov's condition implies Lindeberg's.) - 4.
- There are extensions to random variables which are not independent.
Examples include the
*m*-dependent central limit theorem, the martingale central limit theorem, the central limit theorem for mixing processes. - 5.
- Many important random variables are not sums of independent
random variables. We handle these with Slutsky's theorem and the
method.

**Slutsky's Theorem**: If *X*_{n} converges in distribution to *X*and *Y*_{n} converges in distribution (or in probability) to *c*, a
constant, then *X*_{n}+*Y*_{n} converges in distribution to *X*+*c*. More
generally, if *f*(*x*,*y*) is continuous then
.

Warning: the hypothesis that the limit of *Y*_{n} be constant is essential.

**Definition**: We say *Y*_{n} converges to *Y* in probability if

for each .

The fact is that for *Y* constant convergence in distribution and in
probability are the same. In general convergence in probability implies
convergence in distribution. Both of these are weaker than almost sure
convergence:

**Definition**: We say *Y*_{n} converges to *Y* almost surely if

**The delta method**:
Suppose a sequence *Y*_{n} of random variables converges to some *y* a
constant and that if we define
*X*_{n} = *a*_{n}(*Y*_{n}-*y*) then *X*_{n} converges in
distribution to some random variable *X*. Suppose that *f* is a
differentiable function on the range of *Y*_{n}.
Then
*a*_{n}(*f*(*Y*_{n})-*f*(*y*)) converges in distribution to
.
If
*X*_{n} is in *R*^{p} and *f* maps *R*^{p} to *R*^{q} then
is the
matrix of first derivatives of components of *f*.

**Example**: Suppose
are a sample from a population with
mean ,
variance ,
and third and fourth central moments
and .
Then

where is notation for convergence in distribution. For simplicity I define .

We take *Y*_{n} to be the vector with components
.
Then *Y*_{n} converges to
.
Take
*a*_{n} = *n*^{1/2}. Then

converges in distribution to with

Define

which converges in distribution to which is where . This boils down to .

Remark: In this sort of problem it is best to learn to recognize that the
sample variance is unaffected by subtracting
from each *X*. Thus
there is no loss in assuming
which simplifies
and *a*.

Special case: if the observations are
then and
.
Our calculation has

You can divide through by and get

In fact has a distribution and so the usual central limit theorem shows that

(using mean of is 1 and variance is 2). Factoring out

which is our method calculation except for using

**Monte Carlo**

The last method of distribution theory that I will review is that of Monte Carlo
simulation. Suppose you have some random variables
whose joint distribution is specified and a statistic
whose distribution you want to know. To compute
something like *P*(*T* > *t*) for some specific value of *t* we appeal to the
limiting relative frequency interpretation of probability: *P*(*T*>*t*) is the
limit of the proportion of trials in a long sequence of trials in which *T*occurs. We use a (pseudo) random number generator to generate a sample
and then calculate the statistic getting *T*_{1}. Then we
generate a new sample (independently of our first, say) and calculate
*T*_{2}. We repeat this a large number of times say *N* and just count up
how many of the *T*_{k} are larger than *t*. If there are *M* such *T*_{k}we estimate that
*P*(*T*>*t*) =*M*/*N*.

The quantity *M* has a Binomial(
*N*,*p*=*P*(*T*>*t*)) distribution. The standard
error of *M*/*N* is then *p*(1-*p*)/*N* which is estimated by
*M*(*N*-*M*)/*N*^{3}.
This permits us to guess the accuracy of our study.

Notice that the standard deviation of *M*/*N* is
so
that to improve the accuracy by a factor of 2 requires 4 times as many
samples. This makes Monte Carlo a relatively time consuming method of
calculation. There are a number of tricks to make the method more accurate
(though they only change the constant of proportionality - the SE is still
inversely proportional to the square root of the sample size).

Most computer languages have a facility for generating pseudo uniform
random numbers, that is, variables *U* which have (approximately of course) a
Uniform[0,1] distribution. Other distributions are generated by
transformation:

**Exponential**:
has an exponential distribution:

Random uniforms generated on the computer sometimes have only 6 or 7 digits
or so of detail. This can make the tail of your distribution grainy.
If *U* were actually a multiple of 10^{-6} for instance then the largest
possible value of *X* is .
This problem can be ameliorated by
the following algorithm:

- Generate
*U*a Uniform[0,1] variable. - Pick a small
like 10
^{-3}say. If take . - If
remember that the conditional distribution
of
*Y*-*y*given*Y*>*y*is exponential. You use this by generating a new and computing . Then take . The resulting*Y*has an exponential distribution. You should check this by computing*P*(*Y*>*y*).

**Normal**: In general if *F* is a continuous cdf and *U* is
Uniform[0,1] then
*Y*=*F*^{-1}(*U*) has cdf *F* because

This is almost the technique we used above for the exponential distribution.
For the normal distribution
(
is a common notation
for the standard normal cdf) there is no closed form for *F*^{-1}.
You could use a numerical algorithm to compute *F*^{-1} or you
could use the following Box Müller trick. Generate *U*_{1},*U*_{2}
two independent Uniform[0,1] variables. Define
and
.
Then you can check using the
change of variables formula that *Y*_{1} and *Y*_{2} are independent
*N*(0,1) variables.

**Acceptance Rejection**

If you can't easily calculate *F*^{-1} but you know *f* you can
try the acceptance rejection method. Find a density *g* and a constant
*c* such that
for each *x* and *G*^{-1} is computable
or you otherwise know how to generate observations
independently from *g*. Generate *W*_{1}. Compute
.
Generate a uniform[0,1] random variable *U*_{1} independent of all
the *W*s and let *Y*=*W*_{1} if .
Otherwise get a new *W* and a
new *U* and repeat until you find a
.
You
make *Y* be the last *W* you generated. This *Y* has density *f*.

**Markov Monte Carlo**

In the last 10 years the following tactic has become popular, particularly
for generating multivariate observations. If
is an
(ergodic) Markov chain with stationary transitions and the stationary
initial distribution of *W* has density *f* then you can get random
variables which have the marginal density *f* by starting off the Markov
chain and letting it run for a long time. The marginal distributions
of the *W*_{i} converge to *f*. So you can estimate things like
by computing the fraction of the *W*_{i} which land in
*A*.

There are now many versions of this technique including Gibbs Sampling and the Metropolis-Hastings algorithm. (The technique was invented in the 1950s by physicists: Metropolis et al. One of the authors of the paper was Edward Teller ``father of the hydrogen bomb''.)

**Importance Sampling**

If you want to compute

you can generate observations from a different density

Then

**Variance reduction**

Consider the problem of estimating the distribution of the sample mean
for a Cauchy random variable.
The Cauchy density is

We generate uniforms and then define . Then we compute . Now to estimate

after generating

We can improve this estimate by remembering that -*X*_{i} also
has Cauchy distribution. Take *S*_{i}=-*T*_{i}. Remember that *S*_{i} has
the same distribution as *T*_{i}. Then we try (for *t*>0)

which is the average of two estimates like . The variance of is

(4*N*)^{-1} *Var*(1(*T*_{i} > *t*)+1(*S*_{i} > *t*))
= (4*N*)^{-1} *Var*( 1(|*T*| > *t*))

which is

Notice that the variance has an extra 2 in the denominator and that the numerator is also smaller - particularly for

**Regression estimates**

Suppose we want to compute

where

Notice that and

The value of

and this value can be estimated by regressing the |

2000-02-02