Postscript version of these notes
Stat 804
Lecture 14 Notes
Our goal in this lecture is to develop asymptotic distribution
theory for the sample autocorrelation function.
We let
and
be the ACF and estimated ACF
respectively.
We begin by reducing the behaviour of
to the behaviour
of ,
the sample autocovariance. Our approach is standard
Talyor expansion.
Large sample theory for ratio estimates
Suppose you have pairs (X_{n},Y_{n}) of random variables with
and
We study the large sample behaviour of X_{n}/Y_{n} under the
assumption that
is not 0. We will see that the case
results in some simplifications.
Begin by writing
where
Notice that
in probability.
We may expand
and then write
We want to compute the mean of this expression term by term and the
variance by using the formula for the variance of the sum and so on.
However, what we really do is truncate the infinite sum at some finite
number of terms and compute moments of the finite sum.
I want to be clear about the distinction; to do so I give an example.
Imagine that (X_{n},Y_{n}) has a bivariate normal distribution with means
,
variances
,
and
correlation
between X_{n} and Y_{n}. The quantity
X_{n}/Y_{n} does not have a well defined mean because
.
Our expansion is
still valid, however. Stopping the sum at k=1 leads to the approximation
I now want to look at these terms to decide which are big and which are
small. To do so I introduce big O notation:
Definition: : If U_{n} is a sequence of random variables and a_{n}>0 a sequence of
constants then we write
U_{n} = O_{P}(a_{n})
if, for each
there is an M (depending on
but not
n) such that
The idea is that
U_{n}=O_{P}(a_{n}) means that U_{n} is
proportional in size to a_{n} with the ``constant of
proportionality'' being a random variable which is not
likely to be too large. We also often have use for notation
indicating that U_{n} is actually small compared to a_{n}.
Definition: : We say
U_{n}=o_{P}(a_{n}) if
in probability:
for each
You can manipulate O_{P} and o_{P}notation algebraically with a few rules:
 1.
 If b_{n} is a sequence of constants such that
b_{n} = ca_{n} with
c>0 then
We write
cO_{P}(a_{n}) = O_{P}(a_{n})
 2.
 If
U_{n} = O_{P}(a_{n}) and
V_{n} = O_{P}(b_{n}) for two sequences
a_{n} and b_{n} then
U_{n}V_{n} =O_{P}(a_{n}b_{n})
We express this as
O_{P}(a_{n})O_{P}(b_{n}) = O_{P}(a_{n}b_{n})
 3.
 In particular
b_{n}O_{P}(a_{n}) = O_{P}(b_{n}a_{n})
 4.

 5.

co_{P}(a_{n}) = o_{P}(a_{n})
 6.

o_{P}(a_{n})O_{P}(b_{n}) = o_{P}(a_{n})o_{P}(b_{n}) = o_{P}(a_{n}b_{n})
 7.
 In particular
b_{n}o_{P}(a_{n}) = o_{P}(b_{n}a_{n})
 8.

These notions extend Landau's o and O notation to
random quantities.
Example: : In our ratio example we have
and
In our geometric expansion
Look first at the expansion stopped at k=1. We
have
(The three terms on the RHS of the first line are being
described in terms of roughly how big each is.)
If we stop at k=2 we get
Keeping only terms of order
O_{P}(n^{1/2}) we find
We now take expected values and discover that up to an error
of order n^{1}
BUT you are warned that what is really meant is simply that
there is a random variable which is approximately (neglecting
something which is probably proportional in size to n^{1})
whose expected value is 0. For the normal example the remainder term
in this expansions, that is, the term
O_{P}(n^{1})), is probably small but
its expected value is not defined.
To keep terms up to order
O_{P}(n^{1}) we have to keep terms out to k=2(In general
For k>2 this is
o_{P}(n^{1}) but for k=2 the
term is not negligible.
If we retain terms out to k=2 then we get
Taking expected values here we get
up to terms of order n^{1}. In the normal case
we get
In order to compute the approximate variance we ought to compute the second moment
of
and subtract the square of the first moment.
Imagine you had a random variable of the form
where I assume that the W_{k} do not depend on n.
The mean, taken term by term would be of the form
and the second moment of the form
This leads to a variance of the form
Our expansion above gave
and
from which we get the approximate variance
Now I want to apply these ideas to estimation of .
We make
X_{n} be
and Y_{n} be
(and replace n by
T). Our first order approximation to
is
Our second order approximation would be
I now evaluate means and variances in the special case where
has
been calculated using a known mean of 0. That is
Then
so
To compute the variance we begin with the second moment which is
The expectations in question involve the fourth order product
moments of X and depend on the distribution of the X's and
not just on C_{X}. However, for the interesting case of white
noise, we can compute the expected value. For k> 0 you may assume
that s<t or s=t since the s> t cases can be figured out by swapping
s and t in the s<t case. For s<t the variable X_{s} is independent
of all 3 of X_{s+k}, X_{t} and X_{t+k}. Thus the expectation factors
into something containing the factor
.
For s=t,
we get
.
and so the second
moment is
This is also the variance since, for k> 0 and for white noise,
C_{X}(k)=0.
For k=0 and s <t or s> t the expectation is simply
while for s=t we get
.
Thus the variance of the sample variance (when the mean is known
to be 0) is
For the normal distribution the fourth moment
is given simply
by .
Having computed the variance it is usual to look at the large
sample distribution theory. For k=0 the usual central limit theorem
applies to
(in the case of white noise) to prove that
The presence of
in the formula shows that the approximation is
quite sensitive to the assumption of normality.
For k> 0 the theorem needed is called the mdependent central
limit theorem; it shows that
In each of these cases the assertion is simply that the statistic
in question divided by its standard deviation has an approximate
normal distribution.
The sample autocorrelation at lag k is
For k> 0 we can apply Slutsky's theorem to conclude that
This justifies drawing lines at
to carry
out a 95% test of the hypothesis that the X series is white
noise based on the kth sample autocorrelation.
It is possible to verify that subtraction of
from the
observations before computing the sample covariances does not
change the large sample approximations, although it does affect
the exact formulas for moments.
When the X series is actually not white noise the situation is
more complicated. Consider as an example the model
with
being white noise. Taking
we find that
The expectation is 0 unless either all 4 indices on the
's are the same or the indices come in two pairs of equal
values. The first case requires u_{1}=u_{2}k and v_{1}=v_{2}k and then
su_{1}=tv_{1}. The second case requires one of three pairs of equalities:
su_{1}=tv_{1} and
su_{2} = tv_{2} or
su_{1}=t+kv_{2} and
s+ku_{2} = tv_{1} or
su_{1}=s+ku_{2} and
tv_{1} = t+kv_{2} along with the restriction
that the four indices not all be equal. The actual moment is then
when all four indices are equal and
when there
are two pairs. It is now possible to do the sum using geometric
series identities and compute the variance of
.
It is not particularly enlightening to finish the calculation in
detail.
There are versions of the central limit theorem called
mixing central limit theorems which can be used for ARMA(p,q) processes
in order to conclude that
has asymptotically a standard normal distribution and that the same
is true when the standard deviation in the denominator is replaced by an
estimate. To get from this to distribution theory for the
sample autocorrelation is easiest when the true autocorrelation is 0.
The general tactic is the
method or Taylor expansion. In this
case for each sample size T you have two estimates, say N_{T} and D_{T}of two parameters. You want distribution theory for the ratio
R_{T} = N_{T}/D_{T}. The idea is to write
R_{T}=f(N_{T},D_{T}) where
f(x,y)=x/y and then make use of the fact that N_{T} and D_{T} are
close to the parameters they are estimates of. In our case N_{T}is the sample autocovariance at lag k which is close to the
true autocovariance C_{X}(k) while the denominator D_{T} is the
sample autocovariance at lag 0, a consistent estimator of C_{X}(0).
Write
If we can use a central limit theorem to conclude
that
has an approximately bivariate normal distribution
and if we can neglect the remainder term then
has approximately a normal distribution. The notation here is that
D_{j} denotes differentiation with respect to the jth argument
of f. For
f(x,y) = x/y we have
D_{1}f = 1/y and
D_{2}f = x/y^{2}.
When C_{X}(k)=0 the term involving D_{2}f vanishes and we
simply get the assertion that
has the same asymptotic normal distribution as
.
Similar ideas can be used for the estimated sample partial ACF.
Portmanteau tests
In order to test the hypothesis that a series is white noise using the
distribution theory just given, you have to produce a single statistic
to base youre test on. Rather than pick a single value of k the
suggestion has been made to consider a sum of squares or a weighted
sum of squares of the
.
A typical statistic is
which, for white noise, has approximately a
distribution.
(This fact relies on an extension of the previous computations to conclude
that
has approximately a standard multivariate distribution. This, in turn, relies
on computation of the covariance between
and
.)
When the parameters in an ARMA(p,q) have been estimated by maximum likelihood
the degrees of freedom must be adjusted to Kpq. The resulting
test is the BoxPierce test; a refined version which takes better account
of finite sample properties is the BoxPierceLjung test. SPlus plots the
Pvalues from these tests for 1 through 10 degrees of freedom as
part of the output of arima.diag.
Richard Lockhart
19991101