STAT 350

Assignment 1 Solutions

Postscript version of these notes


1.
Suppose $Z_1,\ldots,Z_{10}$ are independent random variables each having a N(0,6) distribution. Let ${\bar Z} = \sum Z_i/10$, $U=\sum_{i=1}^4 Z_i^2/6$, $V=\sum_{i=5}^{10}Z_i^2/6$, $X=Z_1/\sqrt{V}$ and Y=(3U)/(2V). Give the names for the distributions of each of ${\bar Z}$, U, V, X and Yand use tables to find $P(\vert{\bar Z}\vert > 1)$, $P(U \le 9.49)$, $P(-1.2 \le X \le 1.2)$, P( Y > 6.23), $P(U \le V)$, $P(Z_1 - 2 Z_2 \ge 10)$.

Solution

${\bar Z}$ is N(0,6/10), U is $\chi_4^2$, V is $\chi_6^2$, X is t6, since Z1 and V are independent and $Z_1/\sqrt{V} =
(Z_1/\sqrt{6})/\sqrt{V/6}$, and Y=(U/4)/(V/6) has an F4,6distribution, using the fact that U and V are independent. Your answer should specify the mean and variance for ${\bar Z}$, the various degrees of freedom and note the required independence for X and Y. The required probabilities are 0.197, 0.95, 0.725, 0.025, 0.688, 0.034. I used SPlus to compute these; if you used tables your answers will be less accurate, particularly for $P(U \le V)=P(U/V \le 1) = P(Y < 1.5)$.

You need to remember:

(a)
$Z_i/\sqrt{6}$ has a N(0,1) distribution.

(b)
so Zi2/6 has a $\chi^2_1$ distribution.

(c)
Var(Z1-2Z2) = 6 + (-2)2 6 = 30.

(d)
Var(Zi) = 6 (several people thought Var(Z) = 62).

2.
A new process for measuring the concentration of a chemical in water is being investigated. A total of n samples are prepared in which the concentrations are the known numbers xi for $i=1,\ldots,n$; the new process is used to measure the concentrations for these samples. It is thought likely that the concentrations measured by the new process, which we denote Yi, will be related to the true concentrations via

\begin{displaymath}Y_i = \beta x_i + \epsilon_i
\end{displaymath}

where the $\epsilon_i$ are independent, have mean 0 and all have the same variance $\sigma^2$ which is unknown.

(a)
If this model is fitted by least squares, (that is by minimizing $\sum_i(Y_i-\beta x_i)^2$) show that the least squares estimate of $\beta$ is

\begin{displaymath}\hat\beta = \frac{\sum x_iY_i}{\sum x_i^2} \, .
\end{displaymath}

Solution

Differentiate $\sum_i(Y_i-\hat\beta x_i)^2$ with respect to $\hat\beta$ and get $-2\sum x_i(Y_i-\hat\beta x_i)$ which is 0 if and only if $\hat\beta =
\sum x_i Y_i / \sum x_i^2$. The second derivative of the function being minimized is $2\sum x_i^2 > 0$ so this is a minimum.

Note: I saw a dismaying tendency to write things like

\begin{displaymath}x_i \sum (Y_i - x_i \beta)
\end{displaymath}

moving the xi inside and out of the sum whenever you felt like it. Since it depends on i and the sum is over different values of i it is not a common factor in the sum so you can't take it outside the sum!

(b)
Show that the estimator in part (a) is unbiased.

Solution

Let $K = 1/\sum x_i^2$. Then $E(\hat\beta) = K E(\sum x_i Y_i)$. Use $E(Y_i) = \beta x_i$ to see that

\begin{displaymath}E(\hat\beta) = K \sum x_i^2 \beta
=\beta.\end{displaymath}

(c)
Compute (give a formula for) the standard error of $\hat\beta$.

Solution

You have to compute $Var(\hat\beta) = K^2 \sum Var(x_i Y_i) = \sigma^2 K$which is simply $\sigma^2 / \sum x_i^2$. The standard error is then $\sqrt{{\rm Var}(\hat\beta)} = \sigma \sqrt{K}$.

(d)
The error sum of squares for this model is $\sum(Y_i
-\hat\beta x_i)^2$ which may be shown to have n-1 degrees of freedom. If the xi are the numbers 1, 2, 3 and 4, $\hat\beta = 1$ and the error sum of squares is 0.12 find a 95% confidence interval for $\beta$and explain what further assumptions you must make to do so.

Solution

If we assume that the errors are independent $N(0,\sigma^2$) random variables then $\hat\beta$ is independent of the usual estimate of $\sigma^2$, samely s2 = 0.12/3=0.04 in this case. The usual tstatistic then has a t distribution and the confidence interval is $1\pm t_{3,0.025} \sqrt{0.04}/\sqrt{1^2+2^2+3^2+4^2}$ which boils down to $1\pm (3.18)(0.2)/\sqrt{30}$.

Note: several people used formulas for simple linear regression as if they had fitted an intercept as well. They then had 2 degrees of freedom in spite of the fact that the question says there are n-1 degrees of freedom for error. Also: many of you neglected to assume that the errors have a normal distribution.

(e)
Show that the estimator

\begin{displaymath}\tilde\beta = \frac{\sum Y_i}{\sum x_i}\end{displaymath}

is also unbiased.

Solution

Let $K_2 = 1/\sum x_i$; then $E(\tilde\beta) = K_2 \sum x_i \beta =
\beta$.

(f)
Compute (give a formula for) the standard error of $\tilde\beta$. Which is bigger, the standard error of $\hat\beta$ or that of $\tilde\beta$?

Solution

We have $Var(\tilde\beta)= K_2^2 \sum Var(Y_i) = n\sigma^2/(\sum x_i)^2$. The difference $Var(\tilde\beta) - Var(\hat\beta)$ is then simply

\begin{displaymath}\sigma^2\frac{n\sum x_i^2 - (\sum x_i)^2}{\sum x_i^2 (\sum x_i)^2}\end{displaymath}

The denominator is positive and the numerator is n(n-1) times the usual sample variance of the x's so this difference of variances is positive. This proves that the standard error of $\hat\beta$ is smaller than that of $\tilde\beta$. The standard error of $\tilde\beta$ itself is

\begin{displaymath}\sqrt{{\rm Var}(\tilde\beta)} = \sigma\sqrt{\frac{n}{(\sum x_i)^2}}
\end{displaymath}

(g)
Show that the mle of $\beta$ in this model is $\hat\beta$, the least squares estimate, if the $\epsilon_i$ have normal distributions.

Solution

In this case Yi is $N(x_i\beta, \sigma^2)$ and the likelihood is

\begin{displaymath}\frac{1}{\sqrt{2\pi}\sigma}e^{-(Y_1-x_1\beta)^2/(2\sigma^2)} ...
...\frac{1}{\sqrt{2\pi}\sigma}e^{-(Y_n - x_n\beta)^2/(2\sigma^2)}
\end{displaymath}

As usual you maximize the logarithm which is

\begin{displaymath}-n\log\sqrt{2\pi} -n\log\sigma -\frac{1}{2\sigma^2} \sum (Y_i - x_i
\beta)^2
\end{displaymath}

Take the $\beta$ derivative and get the same equation to solve for $\beta$as in the first part of this problem.

3.
Consider the two-way layout without replicates. We have data Yij for $i=1,\ldots I$ and $j=1,\ldots,J$. We generally fit a so-called additive model

\begin{displaymath}Y_{ij} = \mu + \rho_i+\gamma_j + \epsilon_{ij}
\end{displaymath}

In the following questions consider the case I=2 and J=3.

(a)
If we treat $\mu$, $\rho_1$, $\rho_2$, $\gamma_1$, $\gamma_2$ and $\gamma_3$as the entries in the parameter vector $\beta$ what is the design matrix X=Xa and what is the rank of Xa?

Writing the data as YT = [Y1,1, Y1,2,Y1,3,Y2,1,Y2,2, Y2,3] the design matrix is

\begin{displaymath}X_a = \left[\begin{array}{cccccc}
1 & 1 & 0 & 1 & 0 & 0 \\
1...
...1 & 0 & 1 & 0 \\
1 & 0 & 1 & 0 & 0 & 1 \\
\end{array}\right]
\end{displaymath}

Letting xi denote column i of Xa we have x3 = x1-x2 and x6 = x1 - x4-x5 so that the rank of Xa must be no more than 4. But if a1 x1 + a2 x2 + a4 x4 + a5 x5 = 0 then from row 6 we get a1=0. Then from rows 4 and 5 get a4=0 and a5=0. Finally use row 3 to get a2=0. This shows that columns 1, 2 4 and 5 are linearly independent so tha tXa has rank at least 4 and so exactly 4.

(b)
What is the determinant of the matrix XaT Xa? Is this matrix invertible? How many solutions do the normal equations have?

The matrix XaT Xa is 6 by 6 but has rank only 4 so is singular and must have determinant 0. The normal equations are

\begin{displaymath}\left[\begin{array}{cccccc}
6 & 3 & 3 & 2 & 2 & 2 \\
3 & 3 &...
... Y_{i1} \\
\sum_i Y_{i2} \\
\sum_i Y_{i3} \end{array}\right]
\end{displaymath}

It may be seen that the second and third rows give equations which add up to the equation in the first row as do the fourth, fifth and sixth rows. Eliminate rows 3 and 6, say and solve. This leaves 4 linearly independent equations in 6 unknowns and so there are infinitely many solutions.

NOTE: You need to check that the equations are consistent. Many students row reduced only the matrix of coefficients forgetting the fact that if A has less than full rank Ax=b might have 0 solutions. My solution above row reduces the augmented matrix, checking that there is at least one solution..

(c)
Usually we impose the restrictions $\rho_1+\rho_2=0$ and $\gamma_1+\gamma_2+\gamma_3=0$. Use these restrictions to eliminate $\rho_2$ and $\gamma_3$ from the model equation and, for the parameter vector $\beta^T = (\mu,\rho_1,\gamma_1,\gamma_2)$ find the design matrix Xb.

The restrictions give $\rho_2 = -\rho_1$ and $\gamma_3 = -\gamma_1 -
\gamma_2$. In each model equation which mentions either $\rho_2$ or $\gamma_3$ you replace that parameter by the equivalent formula. So, for example,

\begin{eqnarray*}Y_{2,3}& = & \mu + \rho_2 + \gamma_3 +\epsilon_{2,3} \\
& = & \mu - \rho_1 -\gamma_1-\gamma_2 +\epsilon_{2,3}
\end{eqnarray*}


The 6th row of the design matrix is obtained by reading off the coefficients of $\mu, \rho_1, \gamma_1,\gamma_2$ which are 1, -1, -1 and -1. This makes

\begin{displaymath}X_b = \left[\begin{array}{rrrr}
1 & 1 & 1 & 0 \\
1 & 1 & 0 &...
...\\
1 & -1 & 0 & 1 \\
1 & -1 & -1 & -1 \\
\end{array}\right]
\end{displaymath}

Note: some people eliminated $\rho_1$ and not $\rho_2$. This is fine and leads to a similar matrix with the rows in a different order.

(d)
An alternate set of restrictions is called corner point coding where we assume $\rho_1=\gamma_1=0$. With this restriction and the parameter vector $\beta^T = (\mu,\rho_2,\gamma_2,\gamma_3) $ what is the design matrix Xc?

This just makes the design matrix Xc just the corresponding columns, 1, 3, 5 and 6 of Xa.

(e)
Show that the three design matrices have the same column space by finding a matrix A such that Xa = Xb A and similarly for Xb and Xc and for Xa and Xc.

To write Xc = Xa A just let A be the $6\times 4$ matrix which picks out columns 1,3,5 and 6 of Xa, namely,

\begin{displaymath}A_{ca} =
\left[\begin{array}{cccc}
1 & 0 & 0 & 0 \\
0 & 0 &...
... 0 & 0 \\
0 & 0 & 1 & 0 \\
0 & 0 & 0 & 1
\end{array}\right]
\end{displaymath}

To write Xa = Xc A we just have to put back column 2 and 4 remembering that col 2 is col 1 - col 3 and col 4 is col 1 - col 5 - col 6. Thus

\begin{displaymath}A_{ac} =
\left[\begin{array}{rrrrrr}
1 & 1 & 0 & 1 & 0 & 0 \\...
...0 & 0 &-1 & 1 & 0 \\
0 & 0 & 0 &-1 & 0 & 1
\end{array}\right]
\end{displaymath}

Similarly

\begin{displaymath}A_{bc} =
\left[\begin{array}{rrrr}
1 & 1/2 & 1/3 & 1/3 \\
0 ...
...0 & 0 & -1/3 & -1/3 \\
0 & 0 & 2/3 & -1/3
\end{array}\right]
\end{displaymath}

A vector in the column space of say Xa is of the form Xa v for a vector of coefficients v. But such a vector is Xc Aacv and so of the form Xc wfor the vector w=Aacv and so in the column space of Xc.

(f)
Use the previous part to show that the vectors of fitted values $\hat Y$ will be the same for any solution of the normal equations for any of the three design matrices.

The easy way to do this is to say: the fitted vector $\hat\mu$ is the closest point in the column space of the design matrix to the data vector Y. Since all three have the same column space they all have the same closest point and so the same $\hat\mu$.

Algebra is an alternative tactic: The matrix Abc is invertible and we have

\begin{displaymath}\hat\mu_b = X_b \hat\beta_b = X_b (X_b^T X_b)^{-1} X_b^T Y
\end{displaymath}

Plug in Xc Abc for Xb and get

\begin{displaymath}\hat\mu_b = X_cA_{bc} ( A_{bc}^T X_c^T X_c A_{bc})^{-1} A_{bc}^T X_c^T Y
\end{displaymath}

Use (BC)-1 = C-1B-1 to see that all occurences of Abc cancel out to give

\begin{displaymath}\hat\mu_b = X_c (X_c^T X_c)^{-1}X_c^T Y = \hat\mu_c \, .
\end{displaymath}

The algebraic approach makes it a bit more difficult to deal with the case of Xabecause the normal equations have many solutions. Suppose that $\tilde\beta$ is any solutions of the normal equations $X_a^T X_a \tilde\beta = X_a^T Y$. Then

\begin{displaymath}A_{ac}^T X_c^T X_c A_{ac} \tilde\beta = A_{ac}^TX_c^TY
\end{displaymath}

or

\begin{displaymath}A_{ac}^T\left( X_c^T X_c A_{ac} \tilde\beta - X_c^TY\right) = 0
\end{displaymath}

The matrix Aac has rank 4. If any vector v satisfies AacTv=0 then

\begin{displaymath}A_{ac}^Tv =
\left[\begin{array}{c}
v_1 \\
v_1-v_2 \\
v_2 \\
v_1 - v_3 -v_4 \\
v_3 \\
v_4
\end{array}\right]
\end{displaymath}

so that v1=v2=v3=v4=0. We apply this with

\begin{displaymath}v= X_c^T X_c A_{ac} \tilde\beta - X_c^TY \, .
\end{displaymath}

This shows that

\begin{displaymath}X_c^T X_c A_{ac} \tilde\beta = X_c^TY
\end{displaymath}

so that $A_{ac}\tilde\beta = \hat\beta_c$.

Thus

\begin{displaymath}\hat\mu_a = X_a \tilde\beta = X_c A_{ac}\tilde\beta = X_c \hat\beta_c = \hat\mu_c
\end{displaymath}

4.
From the text question 1.19, 1.23, 2.13 a and b and 2.23 a, b and c. In 2.23 c give a P-value and interpret this P-value.

The following SAS code was used for all these parts:

data gpa;
 infile "CH01PR19.DAT";
 input gpa test;
proc glm;
  model gpa=test;
  estimate "fit_at_5" intercept 1 test 5;
  estimate "fit4.7" intercept 1 test 4.7;
  output out=gpaout r=resid p=fitted ;
run;
proc print data=gpaout;
  var test gpa fitted resid;
run;
proc means data=gpaout;
 var resid;
run;
proc rank normal=vw data=gpaout out=gpaout2;
  var resid;
  ranks normscr;
run;
proc gplot data=gpaout2;
 plot gpa*test fitted *test /overlay;
run;
proc gplot data=gpaout2;
 plot fitted*test;
 plot resid*normscr;
run;
obtaining the following output:
Dependent Variable: GPA
                                   Sum of          Mean
Source                  DF        Squares        Square  F Value    Pr > F
Model                    1     6.43372807    6.43372807    34.00    0.0001
Error                   18     3.40627193    0.18923733
Corrected Total         19     9.84000000

                  R-Square           C.V.      Root MSE           GPA Mean
                  0.653834       17.40057       0.43501            2.50000

Source                  DF      Type I SS   Mean Square  F Value    Pr > F
TEST                     1     6.43372807    6.43372807    34.00    0.0001
Source                  DF    Type III SS   Mean Square  F Value    Pr > F
TEST                     1     6.43372807    6.43372807    34.00    0.0001

Dependent Variable: GPA
                                 T for H0:     Pr > |T|    Std Error of
Parameter          Estimate     Parameter=0                  Estimate
fit_at_5         2.50000000           25.70      0.0001      0.09727213
fit4.7           2.24802632           21.12      0.0001      0.10643937

                                 T for H0:     Pr > |T|    Std Error of
Parameter          Estimate     Parameter=0                  Estimate
INTERCEPT      -1.699561404           -2.34      0.0311      0.72677682
TEST            0.839912281            5.83      0.0001      0.14404759

                 OBS    TEST    GPA     FITTED      RESID

                   1     5.5    3.1    2.91996     0.18004
                   2     4.8    2.3    2.33202    -0.03202
                   3     4.7    3.0    2.24803     0.75197
                   4     3.9    1.9    1.57610     0.32390
                   5     4.5    2.5    2.08004     0.41996
                   6     6.2    3.7    3.50789     0.19211
                   7     6.0    3.4    3.33991     0.06009
                   8     5.2    2.6    2.66798    -0.06798
                   9     4.7    2.8    2.24803     0.55197
                  10     4.3    1.6    1.91206    -0.31206
                  11     4.9    2.0    2.41601    -0.41601
                  12     5.4    2.9    2.83596     0.06404
                  13     5.0    2.3    2.50000    -0.20000
                  14     6.3    3.2    3.59189    -0.39189
                  15     4.6    1.8    2.16404    -0.36404
                  16     4.3    1.4    1.91206    -0.51206
                  17     5.0    2.0    2.50000    -0.50000
                  18     5.9    3.8    3.25592     0.54408
                  19     4.1    2.2    1.74408     0.45592
                  20     4.7    1.5    2.24803    -0.74803

        Analysis Variable : RESID

         N          Mean       Std Dev       Minimum       Maximum
        ----------------------------------------------------------
        20  -7.77156E-17     0.4234117    -0.7480263     0.7519737
        ----------------------------------------------------------

Now to answer the questions:

1.19 a)
The estimates are $\hat\beta_0 = -1.7$ and $\hat\beta_1=0.84$, rounded off sensibly.

b)
It is not so easy to plot the data and regression line; the code above does so but the fitted line is plotted only at the available x values. However, nothing in the plot suggests any big problems.

c)
The first estimate line produces the estimae 2.5 with a standard error of 0.097.

d)
This just asks for the slope again which is 0.84.

1.23 a)
The proc print prints out the residuals and the proc means shows that the variable resid has mean very close to 0.

b)
The MSE is $\hat\sigma^2=0.189$ and the estimate of $\sigma$ is the Root MSE, 0.435. The RMSE has the same units as $\sigma$ which has the same units as Y, namely, grade points.

2.13 a)
This confidence interval is

\begin{displaymath}\hat\beta_0+4.7\hat\beta_1 \pm t_{0.025,18} \hat\sigma \sqrt{x^T (X^TX)^{-1} x}
\end{displaymath}

where xT = (1, 4.7). The quantity $ \hat\sigma \sqrt{x^T (X^TX)^{-1} x}$ is printed out by the estimate statement. It is the standard error 0.106. The actual estimate is 2.248 and the t multiplier is 2.101 so the interval is

\begin{displaymath}2.248 \pm 2.101 \times 0.106 \qquad\mbox{or} \qquad (2.025, 2.471)
\end{displaymath}

with one more digit than needed just so you can check your method.

b)
The prediction for an individual is the same but the interval is

\begin{displaymath}\hat\beta_0+4.7\hat\beta_1 \pm t_{0.025,18} \hat\sigma \sqrt{1+x^T (X^TX)^{-1} x}
\end{displaymath}

or

\begin{displaymath}2.248 \pm 2.101\times \sqrt{0.106^2 + 0.189}
\end{displaymath}

or just

(1.308,3.188)

Notice that this is VERY wide.

2.23 a)
The ANOVA table is straight from the output:
                                   Sum of          Mean
Source                  DF        Squares        Square  F Value    Pr > F
Model                    1     6.43372807    6.43372807    34.00    0.0001
Error                   18     3.40627193    0.18923733
Corrected Total         19     9.84000000

b)
MSR is an estimate of its expected value. See formula 2.57 in the text:

\begin{displaymath}{\rm E}(MSE) = \sigma^2 + \beta_1^2 \sum (x_i - \bar{X})^2
\end{displaymath}

You can use, for instance, proc means to compute the sum of squares at the end or algebraic identities to deduce it from proc glm output. The MSE is an unbiased estimate of $\sigma^2$. These two expected values are equal if the hypothesis $\beta_1=0$ is true.

c)
The F statistic in question is printed in the ANOVA table. It is 34 and the P-value is 0.0001 so that at any reasonable level the hypothesis that $\beta_1=0$ is rejected. The null hypothesis is $\beta_1=0$, the alternative is $\beta_1 \neq 0$ and the conclusion is that the true slope is NOT 0.



Richard Lockhart
1999-02-11