No Title

STAT 804: 97-3

Assignment 2 Solutions

Consider the ARIMA(1,0,1) process
Show that the autocorrelation function is
and
Plot the autocorrelation functions for the ARMA(1,1) process above, the AR(1) process with
and the MA(1) process
on the same plot when and . Compute and plot the partial autocorrelation functions up to lag 30. Comment on the usefulness of these plots in distinguishing the three models. Explain what goes wrong when is close to .
Solution: The most important part of this problem is that when the autocorrelation is identically 0. This means that gives simply white noise. In general in the ARMA model
any common root of the polynomials and gives a common factor on both sides of the model equation which can effectively be cancelled. In other words, if for some particular x then we can write for a suitable and also . In the model equation we can cancel the common factor and reduce the model to an ARMA.
A second important point is that the autocorrelation of an ARMA(1,1) decreases geometrically just like that of an AR(1) but only starting from lag 2 on.
The S command
attach(`/home/math/lockhart/teaching/courses/804/datasets')
will make a dataset influenza available. If you type
ls(pos=2)
you will see the data set for this question and the next two.
The data consist of monthly counts of influenza cases over a 9 and a half year period. Fit an ARIMA model to the data.
Solution
I began my analysis of this data set by plotting the data; see Figure 1.
There is a strong seasonal effect visible in the plot. To make sure that the variation is seasonal I used the S+ function monthplot to get separate series for each of the 12 months of the year; see Figure 2. The result in Figure 2 has two features of importance. First, the pattern of seasonal variation is very clear with months around January, February having very low numbers of cases and months around June and July having very high numbers of cases. Second, in those months where the mean level is low there is also less variation around the mean. This suggests a transformation which decreases the variability at large counts such as a square root or logarithm.
I tried the square root transformation and the logarithmic transformation. The plot of the square roots and the corresponding monthly decomposition are in Figures 3 and 4 respectively.
The plot of the logarithms is in Figure 5. While the square root transformation appears visually adequate I settled on the logarithmic transformation for the following reason: The population of the country has probably been growing roughly exponentially over the interval in question. If influenza rates are a stationary time series then the series where is stationary. Since is a linear function of time will have a linear trend which we could remove by regression. Thus I studied with and estimated using lsfit, i.e. by ordinary least squares. The fitted values of slope and intercept are and when t is measured in months and t=1 corresponds to January 1965. A time series plot of the residuals is in Figure 6; the monthly decomposition of the residual series is in Figure 7.
There is still a strong seasonal component. Two ways suggest themselves to remove the effect: seasonal differencing and subtraction of monthly means. I used the latter. The table of monthly means is in Table 1. A plot of the detrended and deseasonalized series obtained by subtracting these monthly means is in Figure 8.
Call this series . I now started fitting ARMA models to . Plots of the autocorrelation and partial autocorrelation functions are in Figures 9 and 10. They clearly suggest a simple AR(1) model since the partial autocorrelation is essentially 0 at lags over 1 month.
Table 1: Monthly means of the detrended logarithm

I fitted the resulting model using arima.mle. The estimated autoregression parameter is with a standard error of 0.076 while the residual standard error is . The series has mean 0.
The resulting model fit must now be checked to see if further modelling is necessary. I used the function arima.diag to get diagnostic graphs and residuals. The basic diagnostic plot is in Figure 11.
A plot of the partial autocorrelation function of the standardized residuals is in figure 12. The time series plot of the standardized residuals in figure 11 shows a few rather large residuals but nothing too overwhelming. In figure 13 a Q-Q normal plot of the residuals shows some non-normality in the upper tail. It does not seem too severe.
The autocorrelation function and partial autocorrelation function of the residuals are consistent with white noise as are the values of the portmanteau test P-values given in the bottom frame of Figure 11.
In summary we have been led to the model equation
where is the monthly mean given in table 1 and
The standard deviation of the white noise series is 0.38.
Fit an ARIMA model for the Johnson and Johnson earnings per share data which is in the dataset earnings. There is 20 years of quarterly data.
Solution: My personal favourite model is obtained by taking logs first and then, if is the series of logarithms, fitting the model
where W is an ARIMA(1,0,0) series whose fitted model is
with a white noise series with variance 0.0065. The fitted values of and are -0.675 and 0.0415; SPlus does not provide standard errors. The standard errors in the AR coefficients are 0.069 (for the 0.8 value) and 0.11 for the 0.266 value so that both are significant.
Many students fitted a model with one or more differences taken at lag 4. I found it quite hard to get a really good fit this way. There is an important conceptual difference between fitting the linear trend as I have and taking differences. The differencing method supposes that earnings respond more or less directly to earlier values of earnings whereas I suppose the linear trend to arise as a result of an external driving force pushing the earnings up exponentially. Moreover the differenced models require increasing variance with time; my model is stationary about the linear trend.
I accepted a wide variety of answers, grading on the basis of how convincing and well reasoned I thought your process was. Generally I would prefer to see that fitted models were presented more or less as above, complete with clear formulas parameter estimates and standard errors.
Fit an ARIMA model for the data set called fake.
Solution: This is simply an AR(1) and the diagnostics make this very clear. Part of the point here is that for data which genuinely follow an AR(1) model the model selection techniques work pretty well. Real data is not so clean.

DUE: Friday, 10 October.

Richard Lockhart
Tue Nov 25 10:44:43 PST 1997