next up previous


Postscript version of this page

STAT 801: Mathematical Statistics

Moment Generating Functions

Def'n: The moment generating function of a real valued $ X$ is

$\displaystyle M_X(t) = E(e^{tX})
$

defined for those real $ t$ for which the expected value is finite.

Def'n: The moment generating function of $ X\in R^p$ is

$\displaystyle M_X(u) = E[e^{u^tX}]
$

defined for those vectors $ u$ for which the expected value is finite.

Formal connection to moments:

$\displaystyle M_X(t)$ $\displaystyle = \sum_{k=0}^\infty E[(tX)^k]/k!$    
$\displaystyle = \sum_{k=0}^\infty \mu_k^\prime t^k/k!$    

Sometimes can find power series expansion of $ M_X$ and read off the moments of $ X$ from the coefficients of $ t^k/k!$.

Theorem: If $ M$ is finite for all $ t \in [-\epsilon,\epsilon]$ for some $ \epsilon > 0$ then

  1. Every moment of $ X$ is finite.

  2. $ M$ is $ C^\infty$ (in fact $ M$ is analytic).

  3. $ \mu_k^\prime = \frac{d^k}{dt^k} M_X(0)$.

Note: $ C^\infty$ means has continuous derivatives of all orders. Analytic means has convergent power series expansion in neighbourhood of each $ t\in(-\epsilon,\epsilon)$.

The proof, and many other facts about mgfs, rely on techniques of complex variables.

MGFs and Sums

If $ X_1,\ldots,X_p$ are independent and $ Y=\sum X_i$ then the moment generating function of $ Y$ is the product of those of the individual $ X_i$:

$\displaystyle E(e^{tY}) = \prod_i E(e^{tX_i})
$

or $ M_Y = \prod M_{X_i}$.

Note: also true for multivariate $ X_i$.

Problem: power series expansion of $ M_Y$ not nice function of expansions of individual $ M_{X_i}$.

Related fact: first 3 moments (meaning $ \mu$, $ \sigma^2$ and $ \mu_3$) of $ Y$ are sums of those of the $ X_i$:

$\displaystyle E(Y) =$ $\displaystyle \sum E(X_i)$    
$\displaystyle {\rm Var}(Y) =$ $\displaystyle \sum {\rm Var}(X_i)$    
$\displaystyle E[(Y-E(Y))^3] =$ $\displaystyle \sum E[(X_i-E(X_i))^3]$    

but

$\displaystyle E[(Y-E(Y))^4] =
\sum \{E[(X_i-E(X_i))^4] -3E^2[(X_i-E(X_i))^2]\}
+ 3 \left\{\sum E[(X_i-E(X_i))^2]\right\}^2\, .
$

It is possible, however, to replace the moments by other objects called cumulants which do add up properly. The way to define them relies on the observation that the log of the mgf of $ Y$ is the sum of the logs of the mgfs of the $ X_i$. We define the cumulant generating function of a variable $ X$ by

$\displaystyle K_X(t) = \log(M_X(t))
$

Then

$\displaystyle K_Y(t) = \sum K_{X_i}(t)
$

The mgfs are all positive so that the cumulative generating functions are defined wherever the mgfs are. This means we can give a power series expansion of $ K_Y$:

$\displaystyle K_Y(t) = \sum_{r=1}^\infty \kappa_r t^r/r!
$

We call the $ \kappa_r$ the cumulants of $ Y$ and observe

$\displaystyle \kappa_r(Y) = \sum \kappa_r(X_i)
$

To see the relation between cumulants and moments proceed as follows: the cumulant generating function is

$\displaystyle K(t)$ $\displaystyle = \log(M(t))$    
  $\displaystyle = \log( 1 + [\mu_1 t +\mu_2^\prime t^2/2 + \mu_3^\prime t^3/3! + \cdots])$    

To compute the power series expansion we thing of the quantity in $ [\ldots]$ as $ x$ and expand

$\displaystyle \log(1+x) = x-x^2/2+x^3/3-x^4/4 \cdots
$

When you stick in the power series

$\displaystyle x=\mu t +\mu_2^\prime t^2/2 + \mu_3^\prime t^3/3! + \cdots
$

you have to expand out the powers of $ x$ and collect together like terms. For instance,

$\displaystyle x^2$ $\displaystyle = \mu^2 t^2 + \mu\mu_2^\prime t^3$    
  $\displaystyle \qquad + [2\mu_3^\prime \mu/3! +(\mu_2^\prime)^2/4]t^4 + \cdots$    
$\displaystyle x^3$ $\displaystyle = \mu^3 t^3 + 3\mu_2^\prime \mu^2 t^4/2 + \cdots$    
$\displaystyle x^4$ $\displaystyle = \mu^4 t^4 + \cdots$    

Now gather up the terms. The power $ t^1$ occurs only in $ x$ with coefficient $ \mu$. The power $ t^2$ occurs in $ x$ and in $ x^2$ and so on. Putting these together gives

\begin{multline*}
K(t) =
\\
\mu t + [\mu_2^\prime -\mu^2]t^2/2 \\
+ [\mu_3^\p...
...(\mu_2^\prime)^2 + 12
\mu_2^\prime \mu^2 -6\mu^4]t^4/4! + \cdots
\end{multline*}

Comparing coefficients to $ t^r/r!$ we see that

$\displaystyle \kappa_1$ $\displaystyle = \mu$    
$\displaystyle \kappa_2$ $\displaystyle = \mu_2^\prime -\mu^2=\sigma^2$    
$\displaystyle \kappa_3$ $\displaystyle = \mu_3^\prime -3\mu\mu_2^\prime +2\mu^3=E[(X-\mu)^3]$    
$\displaystyle \kappa_4$ $\displaystyle = \mu_4^\prime -4\mu_3^\prime \mu -3(\mu_2^\prime)^2 + 12 \mu_2^\prime \mu^2 -6\mu^4$    
  $\displaystyle = E[(X-\mu)^4]-3\sigma^4$    

Check the book by Kendall and Stuart (or the new version called Kendall's Theory of Advanced Statistics by Stuart and Ord) for formulas for larger orders $ r$.

Example: If $ X_1,\ldots,X_p$ are independent and $ X_i$ has a $ N(\mu_i,\sigma^2_i)$ distribution then

$\displaystyle M_{X_i}(t) =$ $\displaystyle \int_{-\infty}^\infty e^{tx} e^{-\frac{1}{2}(x-\mu_i)^2/\sigma_i^2} dx/(\sqrt{2\pi}\sigma_i)$    
$\displaystyle =$ $\displaystyle \int_{-\infty}^\infty e^{t(\sigma_i z + \mu_i)} e^{-z^2/2} dz/\sqrt{2\pi}$    
$\displaystyle =$ $\displaystyle e^{t\mu_i} \int_{-\infty}^\infty e^{-(z-t\sigma_i)^2/2+t^2\sigma_i^2/2} dz/\sqrt{2\pi}$    
$\displaystyle =$ $\displaystyle e^{\sigma_i^2t^2/2+t\mu_i}$    

This makes the cumulant generating function

$\displaystyle K_{X_i}(t) = \log(M_{X_i}(t)) = \sigma_i^2t^2/2+\mu_i t
$

and the cumulants are $ \kappa_1=\mu_i$, $ \kappa_2=\sigma_i^2$ and every other cumulant is 0. The cumulant generating function for $ Y=\sum X_i$ is

$\displaystyle K_Y(t) = \sum \sigma_i^2 t^2/2 + t \sum \mu_i
$

which is the cumulant generating function of $ N(\sum \mu_i,\sum\sigma_i^2)$.

Example: I am having you derive the moment and cumulant generating function and all the moments of a Gamma rv. Suppose that $ Z_1,\ldots,Z_\nu$ are independent $ N(0,1)$ rvs. Then we have defined $ S_\nu = \sum_1^\nu Z_i^2$ to have a $ \chi^2$ distribution. It is easy to check $ S_1=Z_1^2$ has density

$\displaystyle (u/2)^{-1/2} e^{-u/2}/(2\sqrt{\pi})
$

and then the mgf of $ S_1$ is

$\displaystyle (1-2t)^{-1/2}
$

It follows that

$\displaystyle M_{S_\nu}(t) = (1-2t)^{-\nu/2}
$

which you will show in homework is the moment generating function of a Gamma$ (\nu/2,2)$ rv. This shows that the $ \chi^2_\nu$ distribution has the Gamma$ (\nu/2,2)$ density which is

$\displaystyle (u/2)^{(\nu-2)/2}e^{-u/2} / (2\Gamma(\nu/2)) \, .
$

Example: The Cauchy density is

$\displaystyle \frac{1}{\pi(1+x^2)}\, ;
$

the corresponding moment generating function is

$\displaystyle M(t) = \int_{-\infty}^\infty \frac{e^{tx}}{\pi(1+x^2)} dx
$

which is $ +\infty$ except for $ t=0$ where we get 1. This mgf is exactly the mgf of every $ t$ distribution so it is not much use for distinguishing such distributions. The problem is that these distributions do not have infinitely many finite moments.

This observation has led to the development of a substitute for the mgf which is defined for every distribution, namely, the characteristic function.

Characteristic Functions

Definition: The characteristic function of a real rv $ X$ is

$\displaystyle \phi_X(t) = E(e^{itX})$

where $ i=\sqrt{-1}$ is the imaginary unit.

Aside on complex arithmetic.

Complex numbers: add $ i=\sqrt{-1}$ to the real numbers.

Require all the usual rules of algebra to work.

So: if $ i$ and any real numbers $ a$ and $ b$ are to be complex numbers then so must be $ a+bi$.

Multiplication: If we multiply a complex number $ a+bi$ with $ a$ and $ b$ real by another such number, say $ c+di$ then the usual rules of arithmetic (associative, commutative and distributive laws) require

$\displaystyle (a+bi)(c+di)=$ $\displaystyle ac + adi+bci+bdi^2$    
$\displaystyle =$ $\displaystyle ac +bd(-1) +(ad+bc)i$    
$\displaystyle =$ $\displaystyle (ac-bd) +(ad+bc)i$    

so this is precisely how we define multiplication.

Addition: follow usual rules to get

$\displaystyle (a+bi)+(c+di) = (a+c)+(b+d)i \, .
$

Additive inverses: $ -(a+bi) = -a +(-b)i
$.

Multiplicative inverses:

$\displaystyle \frac{1}{a+bi}$ $\displaystyle = \frac{1}{a+bi}\frac{a-bi}{a-bi}$    
  $\displaystyle = \frac{ a-bi}{a^2-abi+abi-b^2i^2}$    
  $\displaystyle = \frac{a-bi}{a^2+b^2}$    

Division:

$\displaystyle \frac{a+bi}{c+di}$ $\displaystyle = \frac{a+bi}{c+di}\frac{c-di}{c-di}$    
  $\displaystyle = \frac{ac-bd+(bc+ad)i}{c^2+b^2}$    

Notice: usual rules of arithmetic don't require any more numbers than

$\displaystyle x+yi$

where $ x$ and $ y$ are real.

Now look at transcendental functions. For real $ x$ we know $ e^x = \sum x^k/k!$ so our insistence on the usual rules working means

$\displaystyle e^{x+iy} = e^x e^{iy}
$

and we need to know how to compute $ e^{iy}$. Remember in what follows that $ i^2=-1$ so $ i^3=-i$, $ i^4=1$ $ i^5=i^1=i$ and so on. Then

$\displaystyle e^{iy} =$ $\displaystyle \sum_0^\infty \frac{(iy)^k}{k!}$    
$\displaystyle =$ $\displaystyle 1 + iy + (iy)^2/2 +(iy)^3/6 + \cdots$    
$\displaystyle =$ $\displaystyle 1 - y^2/2 + y^4/4! - y^6/6!+ \cdots$    
  $\displaystyle + iy -iy^3/3! +iy^5/5! + \cdots$    
$\displaystyle =$ $\displaystyle \cos(y) +i\sin(y)$    

We can thus write

$\displaystyle e^{x+iy} = e^x(\cos(y)+i\sin(y))
$

Identify $ x+yi$ with the corresponding point $ (x,y)$ in the plane. Picture the complex numbers as forming a plane.

Now every point in the plane can be written in polar co-ordinates as $ (r\cos\theta, r\sin\theta)$ and comparing this with our formula for the exponential we see we can write

$\displaystyle x+iy = \sqrt{x^2+y^2}\, e^{i\theta}
$

for an angle $ \theta\in[0,2\pi)$.

Multiplication revisited: $ x+iy=re^{i\theta}$, $ x^\prime +iy^\prime
= r^\prime e^{i\theta^\prime}$.

$\displaystyle (x+iy)(x^\prime +iy^\prime) = re^{i\theta}r^\prime e^{i\theta^\prime}
=rr^\prime e^{i(\theta+\theta^\prime)} \, .
$

We will need from time to time a couple of other definitions:

Definition: The modulus of $ x+iy$ is

$\displaystyle \vert x+iy\vert = \sqrt{x^2+y^2}
$

Definition: The complex conjugate of $ x+iy$ is $ \overline{x+iy} = x-iy$.

Some identities: $ z=x+iy=re^{i\theta}$ and $ z^\prime = x^\prime+iy^\prime
=r^\prime e^{i\theta^\prime}$.

$\displaystyle z\overline{z} = x^2+y^2=r^2=\vert z\vert^2
$

$\displaystyle \frac{z^\prime}{z} = \frac{z^\prime\overline{z}}{\vert z\vert^2} = rr^\prime
e^{i(\theta^\prime-\theta)}
$

Notes on calculus with complex variables. Essentially the usual rules apply so, for example,

$\displaystyle \frac{d}{dt} e^{it} = ie^{it}
$

We will (mostly) be doing only integrals over the real line; the theory of integrals along paths in the complex plane is a very important part of mathematics, however.

FACT: (not use explicitly in course). If $ f:{\Bbb C}\mapsto{\Bbb C}$ is differentiable then $ f$ is analytic (has power series expansion).

End of Aside

Characteristic Functions

Definition: The characteristic function of a real rv $ X$ is

$\displaystyle \phi_X(t) = E(e^{itX})$

where $ i=\sqrt{-1}$ is the imaginary unit.

Since

$\displaystyle e^{itX} = \cos(tX) + i \sin(tX)
$

we find that

$\displaystyle \phi_X(t) = E(\cos(tX)) + i E(\sin(tX))
$

Since the trigonometric functions are bounded by 1 the expected values must be finite for all $ t$ and this is precisely the reason for using characteristic rather than moment generating functions in probability theory courses.

Theorem 1   For any two real rvs $ X$ and $ Y$ the following are equivalent:
  1. $ X$ and $ Y$ have the same distribution, that is, for any (Borel) set $ A$ we have

    $\displaystyle P(X\in A) = P( Y \in A)
$

  2. $ F_X(t) = F_Y(t) $ for all $ t$.

  3. $ \phi_X=E(e^{itX}) = E(e^{itY}) = \phi_Y(t)$ for all real $ t$.

Moreover, all of these are implied if there is a positive $ \epsilon$ such that for all $ \vert t\vert \le \epsilon$

$\displaystyle M_X(t)=M_Y(t) < \infty\,.
$

Theorem 2   For any two real rvs $ X$ and $ Y$ the following are equivalent:
  1. $ X$ and $ Y$ have the same distribution, that is, for any (Borel) set $ A$ we have

    $\displaystyle P(X\in A) = P( Y \in A)
$

  2. $ F_X(t) = F_Y(t) $ for all $ t$.

  3. $ \phi_X=E(e^{itX}) = E(e^{itY}) = \phi_Y(t)$ for all real $ t$.

Moreover, all of these are implied if there is a positive $ \epsilon$ such that for all $ \vert t\vert \le \epsilon$

$\displaystyle M_X(t)=M_Y(t) < \infty\,.
$

Inversion

Previous theorem is non-constructive characterization. Can get from $ \phi_X$ to $ F_X$ or $ f_X$ by inversion. See homework for basic inversion formula:

If $ X$ is a random variable taking only integer values then for each integer $ k$

$\displaystyle P(X=k)$ $\displaystyle = \frac{1}{2\pi} \int_0^{2\pi} \phi_X(t) e^{-itk} dt$    
  $\displaystyle = \frac{1}{2\pi} \int_{-\pi}^{\pi} \phi_X(t) e^{-itk} dt \, .$    

The proof proceeds from the formula

$\displaystyle \phi_X(t) = \sum_k e^{ikt} P(X=k) \, .
$

Now suppose that $ X$ has a continuous bounded density $ f$. Define

$\displaystyle X_n = [nX]/n
$

where $ [a]$ denotes the integer part (rounding down to the next smallest integer). We have

$\displaystyle P(k/n \le X < (k+1)/n) =$ $\displaystyle P([nX]=k)$    
$\displaystyle =$ $\displaystyle \frac{1}{2\pi} \int_{-\pi}^{\pi} \phi_{[nX]}(t)$    
  $\displaystyle \times e^{-itk} dt \, .$    

Make the substitution $ t=u/n$, and get

$\displaystyle n P(k/n \le X < (k+1)/n) =\frac{1}{2\pi}
\int_{-n\pi}^{n\pi} \phi_{[nX]}(u/n)e^{iuk/n} du \, .
$

Now, as $ n\to\infty$ we have

$\displaystyle \phi_{[nX]}(u/n) = E(e^{iu[nX]/n}) \to E(e^{iuX})
$

(by the dominated convergence theorem - the dominating random variable is just the constant 1). The range of integration converges to the whole real line and if $ k/n \to x$ we see that the left hand side converges to the density $ f(x)$ while the right hand side converges to

$\displaystyle \frac{1}{2\pi} \int_{-\infty}^\infty \phi_X(u) e^{-iux} du
$

which gives the inversion formula

$\displaystyle f_X(x) = \frac{1}{2\pi} \int_{-\infty}^\infty \phi_X(u) e^{-iux} du \, .
$

Many other such formulas are available to compute things like $ F(b) - F(a)$ and so on.

All such formulas are sometimes referred to as Fourier inversion formulas; the characteristic function itself is sometimes called the Fourier transform of the distribution or cdf or density of $ X$.

Inversion of the Moment Generating Function

MGF and characteristic function related formally:

$\displaystyle M_X(it) = \phi_X(t) \, .
$

When $ M_X$ exists this relationship is not merely formal; the methods of complex variables mean there is a ``nice'' (analytic) function which is $ E(e^{zX})$ for any complex $ z=x+iy$ for which $ M_X(x)$ is finite.

SO: there is an inversion formula for $ M_X$ using a complex contour integral:

If $ z_1$ and $ z_2$ are two points in the complex plane and $ C$ a path between these two points we can define the path integral

$\displaystyle \int_C f(z) dz
$

by the methods of line integration.

Do algebra with such integrals via usual theorems of calculus. The Fourier inversion formula was

$\displaystyle 2\pi f(x) = \int_{-\infty}^\infty \phi(t) e^{-itx} dt
$

so replacing $ \phi$ by $ M$ we get

$\displaystyle 2 \pi f(x) = \int_{-\infty}^\infty M(it) e^{-itx} dt \, .
$

If we just substitute $ z=it$ then we find

$\displaystyle 2\pi i f(x) = \int_C M(z) e^{-zx} dz
$

where the path $ C$ is the imaginary axis. Methods of complex integration permit us to replace $ C$ by any other path which starts and ends at the same place. Sometimes can choose path to make it easy to do the integral approximately; this is what saddlepoint approximations are. Inversion formula is called the inverse Laplace transform; the mgf is also called the Laplace transform of the distribution or cdf or density.

Applications of Inversion

1): Numerical calculations

Example: Many statistics have a distribution which is approximately that of

$\displaystyle T= \sum \lambda_j Z_j^2
$

where the $ Z_j$ are iid $ N(0,1)$. In this case

$\displaystyle E(e^{itT})$ $\displaystyle = \prod E(e^{it\lambda_j Z_j^2})$    
  $\displaystyle = \prod (1-2it\lambda_j)^{-1/2} \, .$    

Imhof ( Biometrika, 1961) gives a simplification of the Fourier inversion formula for

$\displaystyle F_T(x) - F_T(0)
$

which can be evaluated numerically:

$\displaystyle F_T(x) - F_T(0) =$ $\displaystyle \int_0^x f_T(y) dy$    
$\displaystyle =$ $\displaystyle \int_0^x \frac{1}{2\pi} \int_{-\infty}^\infty$    
  $\displaystyle \times\prod (1-2it\lambda_j)^{-1/2} e^{-ity} dt dy \, .$    

Multiply

$\displaystyle \phi(t) = \left[\frac{1}{\prod(1-2it\lambda_j)}\right]^{1/2}
$

top and bottom by the complex conjugate of the denominator:

$\displaystyle \phi(t) =
\left[\frac{\prod(1+2it\lambda_j)}{\prod(1+4t^2\lambda_j^2)}\right]^{1/2}
\, .
$

The complex number $ 1+2it\lambda_j$ is $ r_j e^{i\theta_j}$ where $ r_j = \sqrt{1+4t^4\lambda_j^2}$ and $ \tan(\theta_j) = 2t\lambda_j$. This allows us to rewrite

$\displaystyle \phi(t) = \left[\frac{\prod r_j e^{i\sum\theta_j}}{\prod
r_j^2}\right]^{1/2}
$

or

$\displaystyle \phi(t) =
\frac{
e^{i\sum\tan^{-1}(2t\lambda_j)/2}
}{
\prod(1+4t^2\lambda_j^2)^{1/4}
} \, .
$

Assemble this to give

$\displaystyle F_T(x) - F_T(0) =
\frac{1}{2\pi} \int_{-\infty}^\infty
\frac{
e^{i\theta(t)}
}{
\rho(t)
}
\int_0^x e^{-iyt}dy dt
$

where

$\displaystyle \theta(t) = \sum \tan^{-1}(2t\lambda_j) /2$

and $ \rho(t) = \prod(1+4t^2\lambda_j^2)^{1/4}$. But

$\displaystyle \int_0^x e^{-iyt}dy = \frac{e^{-ixt}-1}{-it} \, .
$

We can now collect up the real part of the resulting integral to derive the formula given by Imhof. I don't produce the details here.

2): The central limit theorem (in some versions) can be deduced from the Fourier inversion formula: if $ X_1,\ldots,X_n$ are iid with mean 0 and variance 1 and $ T=n^{1/2}\bar{X}$ then with $ \phi$ denoting the characteristic function of a single $ X$ we have

$\displaystyle E(e^{itT})$ $\displaystyle = E(e^{in^{-1/2} t\sum X_j})$    
  $\displaystyle = \left[\phi(n^{-1/2}t)\right]^n$    
  $\displaystyle \approx \left[\phi(0) +\frac{t\phi^\prime(0)}{\sqrt{n}} + \frac{t^2\phi^{\prime\prime}(0)}{2n}+ o(n^{-1})\right]^n \, .$    

But now $ \phi(0) = 1$ and

$\displaystyle \phi^\prime(t) = \frac{d}{dt} E(e^{itX_1}) = iE(X_1e^{itX_1}) \, .
$

So $ \phi^\prime(0) = E(X_1) =0$. Similarly

$\displaystyle \phi^{\prime\prime}(t) = i^2 E(X_1^2e^{itX_1})
$

so that

$\displaystyle \phi^{\prime\prime}(0) = -E(X_1^2) =-1 \, .
$

It now follows that

$\displaystyle E(e^{itT})$ $\displaystyle \approx [1-t^2/(2n) + o(1/n)]^n$    
  $\displaystyle \to e^{-t^2/2} \, .$    

With care we can then apply the Fourier inversion formula and get

$\displaystyle f_T(x)$ $\displaystyle = \frac{1}{2\pi } \int_{-\infty}^\infty e^{-itx} [\phi(tn^{-1/2})]^n dt$    
  $\displaystyle \to \frac{1}{2\pi } \int_{-\infty}^\infty e^{-itx} e^{-t^2/2} dt$    
  $\displaystyle =\frac{1}{\sqrt{2\pi}} \phi_Z(-x)$    

where $ \phi_Z $ is the characteristic function of a standard normal variable $ Z$. Doing the integral we find

$\displaystyle \phi_Z(x) = \phi_Z(-x) = e^{-x^2/2}
$

so that

$\displaystyle f_T(x) \to \frac{1}{\sqrt{2\pi}} e^{-x^2/2}
$

which is a standard normal random variable.

This proof of the central limit theorem is not terribly general since it requires $ T$ to have a bounded continuous density. The central limit theorem itself is a statement about cdfs not densities and is

$\displaystyle P(T \le t) \to P(Z \le t) \, .
$

3) Saddlepoint approximation from MGF inversion formula

$\displaystyle 2\pi i f(x) = \int_{-i\infty}^{i\infty} M(z) e^{-zx} dz
$

(limits of integration indicate contour integral running up imaginary axis.) Replace contour (using complex variables) with line $ Re(z)=c$. ($ Re(Z)$ denotes the real part of $ z$, that is, $ x$ when $ z=x+iy$ with $ x$ and $ y$ real.) Must choose $ c$ so that $ M(c) < \infty$. Rewrite inversion formula using cumulant generating function $ K(t) = \log(M(t))$:

$\displaystyle 2\pi i f(x) = \int_{c-i\infty}^{c+i\infty} \exp(K(z)-zx) dz \, .
$

Along the contour in question we have $ z=c+iy$ so we can think of the integral as being

$\displaystyle i\int_{-\infty}^\infty \exp(K(c+iy)-(c+iy)x) dy \, .
$

Now do a Taylor expansion of the exponent:

$\displaystyle K(c+iy)-(c+iy)x =
K(c)-cx +iy(K^\prime(c)-x)\\ -y^2 K^{\prime\prime}(c)/2+\cdots \, .
$

Ignore the higher order terms and select a $ c$ so that the first derivative

$\displaystyle K^\prime(c)-x
$

vanishes. Such a $ c$ is a saddlepoint. We get the formula

$\displaystyle 2\pi f(x) \approx \exp(K(c)-cx) \int_{-\infty}^\infty \exp(-y^2
K^{\prime\prime}(c)/2) dy
\, .
$

The integral is just a normal density calculation and gives $ \sqrt{2\pi/K^{\prime\prime}(c)}$. The saddlepoint approximation is

$\displaystyle f(x)=\frac{\exp(K(c)-cx)}{\sqrt{2\pi K^{\prime\prime}(c)}} \, .
$

Essentially the same idea lies at the heart of the proof of Sterling's approximation to the factorial function:

$\displaystyle n! = \int_0^\infty \exp(n\log(x) -x) dx \, .
$

The exponent is maximized when $ x=n$. For $ n$ large we approximate $ f(x) = n\log(x) -x$ by

$\displaystyle f(x) \approx f(x_0) + (x-x_0) f^\prime(x_0) + (x-x_0)^2
f^{\prime\prime}(x_0)/2
$

and choose $ x_0 = n$ to make $ f^\prime(x_0) = 0$. Then

$\displaystyle n! \approx \int_0^\infty \exp[n\log(n) - n -
(x-n)^2/(2n)] dx \, .
$

Substitute $ y = (x-n)/\sqrt{n}$ to get the approximation

$\displaystyle n! \approx n^{1/2}n^n e^{-n} \int_{-\infty}^\infty e^{-y^2/2} dy
$

or

$\displaystyle n! \approx \sqrt{2\pi} n^{n+1/2}e^{-n} \, .
$

This tactic is called Laplace's method. Note that I am being very sloppy about the limits of integration; to do the thing properly you have to prove that the integral over $ x$ not near $ n$ is negligible.

next up previous


Postscript version of this page



Richard Lockhart
2001-01-21