Fitting the I part is easy we simply difference d times. The same observation applies to seasonal multiplicative model. Thus to fit an ARIMA(p,d,q) model to X you compute Y =(I-B)d X (shortening your data set by d observations) and then you fit an ARMA(p,q) model to Y. So we assume that d=0.
Simplest case: fitting the AR(1) model
Our basic strategy will be:
Generally the full likelihood is rather complicated; we will use conditional likelihoods and ad hoc estimates of some parameters to simplify the situation.
If the errors
are normal then so is the series X. In general
the vector
has a
where
and
is a vector all of whose entries are
.
The joint density of X is
It is possible to carry out full maximum likelihood by maximizing the quantity in question numerically. In general this is hard, however.
Here I indicate some standard tactics. In your homework I will be asking you to carry through this analysis for one particular model.
Consider the model
Now compute
To find
you now plug
and
into
(getting the so called profile likelihood
)
and maximize over
.
Having thus found
the mles of
and
are simply
and
.
It is worth observing that fitted residuals can then be calculated:
In general, we simplify the maximum likelihood problem several ways:
In the AR(1) case Y is just
while
Z is X0. We take our conditional log-likelihood to be
Notice that we have made a great many suggestions for simplifications and adjustments. This is typical of statistical research - many ideas, only slightly different from each other, are suggested and compared. In practice it seems likely that there is very little difference between all the methods. I am asking you in a homework problem to investigate the differences between several of these methods on a single data set.
For the model
An alternative to estimating
by
is to define
and then recognize that
Notice that if we put
To compute a full mle of
you generally begin by finding preliminary estimates
say by one of the conditional likelihood
methods above and then iterate via Newton-Raphson or
some other scheme for numerical maximization of the
log-likelihood.
Here we consider the model with known mean (generally this
will mean we estimate
and subtract the mean
from all the observations):
In general X has a
distribution and, letting
denote the vector of bis we find
Notice that
Now imagine that the data were actually
Method A: Put
since 0 is
the most probable value and maximize
Method B: Backcasting is the process of
guessing
on the basis of the data; we replace
in the log likelihood by
We will use the EM algorithm to solve this problem.