Postscript version of this page
STAT 801: Mathematical Statistics
Moment Generating Functions
Def'n: The moment generating function of a real valued
is
defined for those real
for which the expected value is finite.
Def'n: The moment generating function of
is
defined for those vectors
for which the expected value is finite.
Formal connection to moments:
Sometimes can find power series expansion of
and read off the moments of
from the coefficients of
.
Theorem: If
is finite for all
for some
then
- Every moment of
is finite.
is
(in fact
is analytic).
-
.
Note:
means has continuous derivatives of all orders. Analytic
means has convergent power series expansion in neighbourhood of each
.
The proof, and many other facts about mgfs, rely on techniques
of complex variables.
MGFs and Sums
If
are independent and
then
the moment generating function of
is the product of those
of the individual
:
or
.
Note: also true for multivariate
.
Problem: power series expansion of
not
nice function of expansions of individual
.
Related fact: first 3 moments
(meaning
,
and
) of
are sums of those
of the
:
but
It is possible, however, to replace the moments by other objects called
cumulants which do add up properly. The way to define them relies
on the observation that the log of the mgf of
is the sum of the logs
of the mgfs of the
. We define the cumulant generating function of a
variable
by
Then
The mgfs are all positive so that the cumulative generating
functions are defined wherever the mgfs are. This means we can
give a power series expansion of
:
We call the
the cumulants of
and observe
To see the relation between cumulants and moments proceed as follows:
the cumulant generating function is
To compute the power series expansion we thing of the quantity in
as
and expand
When you stick in the power series
you have to expand out the powers of
and collect together like terms.
For instance,
Now gather up the terms. The power
occurs only in
with
coefficient
. The power
occurs in
and in
and so on.
Putting these together gives
Comparing coefficients to
we see that
Check the book by Kendall and Stuart (or the new version
called Kendall's Theory of Advanced Statistics by Stuart and Ord)
for formulas for larger orders
.
Example: If
are independent and
has a
distribution
then
This makes the cumulant generating function
and the cumulants are
,
and every other
cumulant is 0. The cumulant generating function for
is
which is the cumulant generating function of
.
Example: I am having you derive the moment and cumulant generating function
and all the moments of a Gamma rv. Suppose that
are independent
rvs. Then we have defined
to have a
distribution. It is easy to check
has density
and then the mgf of
is
It follows that
which you will show in homework is the moment generating function of
a Gamma
rv. This shows that the
distribution has
the Gamma
density which is
Example: The Cauchy density is
the corresponding moment generating function is
which is
except for
where we get 1.
This mgf is exactly the mgf of every
distribution
so it is not much use for distinguishing such distributions.
The problem is that these distributions do not have infinitely
many finite moments.
This observation has led to the development of a substitute for the mgf
which is defined for every distribution, namely, the characteristic function.
Characteristic Functions
Definition: The characteristic function of a real rv
is
where
is the imaginary unit.
Aside on complex arithmetic.
Complex numbers: add
to the
real numbers.
Require all the usual rules of algebra to work.
So:
if
and any real numbers
and
are to be complex numbers then so
must
be
.
Multiplication:
If we multiply a complex number
with
and
real by
another such number, say
then the usual rules of arithmetic
(associative,
commutative and distributive laws) require
so this is precisely how we define multiplication.
Addition: follow usual rules to get
Additive inverses:
.
Multiplicative inverses:
Division:
Notice: usual rules of arithmetic don't require any more numbers
than
where
and
are real.
Now look at transcendental functions. For real
we know
so
our insistence on the usual rules working means
and we need to know how to compute
. Remember in what follows that
so
,
and so on. Then
We can thus write
Identify
with the
corresponding point
in the plane. Picture the complex numbers
as forming a plane.
Now every point in the plane can be written in polar co-ordinates as
and comparing this with our formula for the
exponential we see we can write
for an angle
.
Multiplication revisited:
,
.
We will need from time to time a couple of other definitions:
Definition: The modulus of
is
Definition: The complex conjugate of
is
.
Some identities:
and
.
Notes on calculus with complex variables. Essentially the usual rules apply so,
for example,
We will (mostly) be doing only integrals over the real line; the theory of integrals
along paths in the complex plane is a very important part of mathematics, however.
FACT: (not use explicitly in course). If
is
differentiable then
is analytic (has power series expansion).
End of Aside
Characteristic Functions
Definition: The characteristic function of a real rv
is
where
is the imaginary unit.
Since
we find that
Since the trigonometric functions are bounded by 1 the expected values must
be finite for all
and this is precisely the reason for using
characteristic rather than moment generating functions in probability
theory courses.
Theorem 1
For any two real rvs

and

the following are
equivalent:
and
have the same distribution, that is, for any (Borel)
set
we have
-
for all
.
-
for all real
.
Moreover, all of these are implied if there is a positive

such
that for all
Theorem 2
For any two real rvs

and

the following are
equivalent:
and
have the same distribution, that is, for any (Borel)
set
we have
-
for all
.
-
for all real
.
Moreover, all of these are implied if there is a positive

such
that for all
Inversion
Previous theorem is non-constructive characterization.
Can get from
to
or
by inversion.
See homework for basic inversion formula:
If
is a random variable taking only integer values then
for each integer
The proof proceeds from the formula
Now suppose that
has a continuous bounded density
. Define
where
denotes the integer part (rounding down to the next smallest
integer). We have
Make the substitution
, and get
Now, as
we have
(by the dominated convergence theorem - the dominating random variable
is just the constant 1). The range of integration converges to the whole
real line and if
we see that the left hand side converges to
the density
while the right hand side converges to
which gives the inversion formula
Many other such formulas are available to compute things like
and so on.
All such formulas are sometimes referred to as Fourier inversion formulas;
the characteristic function itself is sometimes called the Fourier
transform of the distribution or cdf or density of
.
Inversion of the Moment Generating Function
MGF and characteristic function related
formally:
When
exists this relationship is not merely formal; the
methods of complex variables mean there is a ``nice'' (analytic)
function which is
for any complex
for which
is finite.
SO: there is an inversion formula for
using a
complex contour integral:
If
and
are two
points in the complex plane and
a path between these two points we
can define the path integral
by the methods of line integration.
Do algebra with
such integrals via usual theorems of calculus.
The Fourier
inversion formula was
so replacing
by
we get
If we just substitute
then we find
where the path
is the imaginary axis.
Methods of complex integration permit us to replace
by any other path which starts and ends at the same place. Sometimes
can choose path to make it easy to do the integral
approximately; this is what saddlepoint approximations are.
Inversion formula is called the inverse Laplace transform; the mgf is
also called the Laplace transform of the distribution or cdf or
density.
Applications of Inversion
1): Numerical calculations
Example: Many statistics have a distribution which is approximately that
of
where the
are iid
. In this case
Imhof ( Biometrika, 1961) gives a simplification of the Fourier
inversion formula for
which can be evaluated numerically:
Multiply
top and bottom by the complex conjugate of the denominator:
The complex number
is
where
and
.
This allows us to rewrite
or
Assemble this to give
where
and
.
But
We can now collect up the real part of the resulting integral to
derive the formula given by Imhof. I don't produce the details here.
2): The central limit theorem (in some versions) can be deduced from
the Fourier inversion formula:
if
are iid with mean 0 and variance 1 and
then with
denoting the characteristic function of
a single
we have
But now
and
So
.
Similarly
so that
It now follows that
With care we can then apply the Fourier inversion formula and
get
where
is the characteristic function of a standard normal variable
. Doing the integral we find
so that
which is a standard normal random variable.
This proof of the central limit theorem is not terribly general
since it requires
to have a bounded continuous density. The central
limit theorem itself is a statement about cdfs not densities and is
3) Saddlepoint approximation from
MGF inversion formula
(limits of integration indicate contour
integral running up imaginary axis.)
Replace contour (using complex variables) with
line
. (
denotes the real part
of
, that is,
when
with
and
real.) Must choose
so that
.
Rewrite inversion formula using cumulant generating function
:
Along the contour in question we have
so we can think of the
integral as being
Now do a Taylor expansion of the exponent:
Ignore the higher order terms and select a
so that the first derivative
vanishes. Such a
is a saddlepoint. We get the formula
The integral is just a normal density calculation and gives
. The saddlepoint approximation is
Essentially the same idea lies at the heart of the proof of
Sterling's approximation to the factorial function:
The exponent is maximized when
. For
large
we approximate
by
and choose
to make
. Then
Substitute
to get the approximation
or
This tactic is called Laplace's method. Note that I am being very sloppy about the
limits of integration; to do the thing properly you have to prove that
the integral over
not near
is negligible.
Postscript version of this page
Richard Lockhart
2001-01-21