next up previous
Next: About this document

STAT 350: 97-1

Final Exam, 8 April 1997Instructor: Richard Lockhart

Instructions: This is an open book test. You may use notes, text, other books and a calculator. Your presentations of statistical analysis will be marked for clarity of explanation. I expect you to explain what assumptions you are making and to comment if those assumptions seem unreasonable. The exam is out of 60.

  1. When a weight is hung from a wire, the wire stretches (returning to its original length when the weight is removed). A 1 kilogram weight is hung from a piece of wire and the length stretched is measured. This is repeated and the two resulting lengths are tex2html_wrap_inline216 and tex2html_wrap_inline218 . Then a 2 kilogram weight is tried 3 times resulting in lengths tex2html_wrap_inline220 , tex2html_wrap_inline222 and tex2html_wrap_inline224 . To save analysis effort the experimenter averages the two measurements made with the 1 kilogram weight, obtaining tex2html_wrap_inline226 and the 3 measurements made with the two kilogram weight, obtaining tex2html_wrap_inline228 . [Total of 20 marks]

    1. Assume that the individual lengths satisfy, for i from 1 to 2 and j from 1 to 2 (for i=1) or 1 to 3 (for i=2),

      displaymath238

      where the errors tex2html_wrap_inline240 are independent normal variables and have mean 0 and variance tex2html_wrap_inline242 . What is the design matrix for this linear model? [2 marks]

      Solution:

      displaymath244

    2. Give an explicit, simple, formula for the least squares estimate of tex2html_wrap_inline246 ; I do not want a general formula such as tex2html_wrap_inline248 . [4 marks]

      Solution:

      displaymath250

      and

      displaymath252

      so that

      displaymath254

    3. Give the mean and variance of the estimator in (b). [2 marks]

      Solution:

      displaymath256

      and

      displaymath258

    4. The average measurements tex2html_wrap_inline260 also satisfy a linear model

      displaymath262

      1. What is tex2html_wrap_inline264 in terms of tex2html_wrap_inline266 ? [1 marks] Solution:

        displaymath268

      2. What is the joint distribution of tex2html_wrap_inline270 ? In particular what are the variances and means of each tex2html_wrap_inline272 ? [3 marks]

        Solution:

        displaymath274

        so that tex2html_wrap_inline276 has a multivariate normal distribution with mean 0 and variance covariance matrix

        displaymath278

      3. What is the design matrix of this linear model? [1 marks]

        Solution:

        displaymath280

    5. Show that the weighted least squares estimate of tex2html_wrap_inline282 is

      displaymath284

      [4 marks]

      Solution: The variances of the errors are tex2html_wrap_inline286 and tex2html_wrap_inline288 so that the weights are tex2html_wrap_inline290 and tex2html_wrap_inline292 . Then

      displaymath294

      and

      displaymath296

      so that

      displaymath298

    6. What is the distribution of tex2html_wrap_inline300 . [2 marks]

      Solution: Normal with mean tex2html_wrap_inline302 and variance tex2html_wrap_inline304 .

    7. Why would analysis of the original variables tex2html_wrap_inline306 be better than analysis of the tex2html_wrap_inline308 ? [1 mark]

      Solution: We would have 4 degrees of freedom for error rather than 1.

  2. A variable Y (a measurement of oxygen taken up by a system) is regressed on 4 predictors tex2html_wrap_inline312 . A total of 20 measurements were made and Y was regressed on various subsets of the predictor variables leading to the following table of Error Sums of Squares.

    Vars ESS Vars ESS Vars ESS Vars ESS
    tex2html_wrap_inline316 154 tex2html_wrap_inline318 109 tex2html_wrap_inline320 133 tex2html_wrap_inline322 139
    tex2html_wrap_inline324 156 tex2html_wrap_inline326 144 tex2html_wrap_inline328 175 tex2html_wrap_inline330 132
    tex2html_wrap_inline332 203 tex2html_wrap_inline334 146 tex2html_wrap_inline336 106 All 104
    tex2html_wrap_inline338 250 tex2html_wrap_inline340 150 tex2html_wrap_inline342 107 None 506

    1. Does adding the variables tex2html_wrap_inline344 and tex2html_wrap_inline346 to the model containing tex2html_wrap_inline348 and tex2html_wrap_inline350 significantly improve the fit? [6 marks]

      Solution: This compares the model with all variables in to the model with just tex2html_wrap_inline352 and tex2html_wrap_inline354 and so

      displaymath356

      This is much less than 1 so the added variables are not significant.

    2. Use Backwards selection with a 10% significance level to stay to select a suitable subset of regression variables. [8 marks]

      Solution: We begin with all variables. Among the 3 variable models the model containing only tex2html_wrap_inline358 , tex2html_wrap_inline360 and tex2html_wrap_inline362 has the smallest error sum of squares so if we delete a variable it must be tex2html_wrap_inline364 . The F statistic is

      displaymath368

      so we delete tex2html_wrap_inline370 . Among the two variable models which contain 2 of the variables tex2html_wrap_inline372 , tex2html_wrap_inline374 and tex2html_wrap_inline376 the model containing tex2html_wrap_inline378 and tex2html_wrap_inline380 has the smallest error sum of squares so we try to delete tex2html_wrap_inline382 getting

      displaymath384

      which is still far from significant. We delete tex2html_wrap_inline386 and look at 1 variable models which either use tex2html_wrap_inline388 or tex2html_wrap_inline390 . The smallest error SS is for tex2html_wrap_inline392 so we try to delete tex2html_wrap_inline394 getting

      displaymath396

      We compare this to the F tables with 1 numerator and 17 denominator degrees of freedom and see that tex2html_wrap_inline400 so that tex2html_wrap_inline402 and tex2html_wrap_inline404 will be retained.

    3. If the estimated slope associated with tex2html_wrap_inline406 in the model including tex2html_wrap_inline408 and tex2html_wrap_inline410 only as predictors is positive what is the value of the t statistics for testing the hypothesis that the true coefficient of tex2html_wrap_inline414 is 0? [1 mark]

      Solution: tex2html_wrap_inline416 .

  3. Five different treatments, A, B, C, D and E, are to be examined for their effect on blood pressure. Fifty patients are randomly split into 5 groups of 10. The initial blood pressure X of each patient is measured, the treatment is applied and then the final blood pressure Y is measured. Let tex2html_wrap_inline422 label the treatment and j running from 1 to 10 label the patient within the treatment group. Three models were fitted:

    Model I

    displaymath426

    the error sum of squares is 85355 and the estimates are

    tex2html_wrap_inline428 tex2html_wrap_inline430
    37.26 0.65

    Model II

    displaymath432

    the error sum of squares is 66115 and the estimates are

    tex2html_wrap_inline434 tex2html_wrap_inline436 tex2html_wrap_inline438 tex2html_wrap_inline440 tex2html_wrap_inline442 tex2html_wrap_inline444
    14.2424 67.5325 48.3918 49.6033 68.7786 0.5509

    For this model

    displaymath446

    Model III

    displaymath448

    the error sum of squares is 62433 and the estimates are

    tex2html_wrap_inline450 tex2html_wrap_inline452 tex2html_wrap_inline454 tex2html_wrap_inline456 tex2html_wrap_inline458
    52.04954 -68.05918 62.48453 46.66416 112.529
    tex2html_wrap_inline460 tex2html_wrap_inline462 tex2html_wrap_inline464 tex2html_wrap_inline466 tex2html_wrap_inline468
    0.2309892 1.619385 0.4313726 0.5757114 0.1818949

    1. Of the three models, based on the information available to you, which model provides the best fit to the data. [10 marks]

      Solution: Testing Model III vs Model II we get

      displaymath470

      which is not significant. Thus Model II is preferred to Model III. Comparing Model II to Model I we have

      displaymath472

      which leads to a P-value around 0.03 so that Model II is preferred to Model I.

    2. There are 10 possible comparisons between pairs of treatments. It is desired to give simultaneous 95% confidence intervals for all possible comparisons based based on the second model above. I want you to show clearly that you know how to get these ten confidence intervals. Your answer will include a clear description of the parameters for which intervals are needed, written in terms of the notation used above for the second model and the resulting confidence interval for the difference between treatment A and treatment B with all the numbers filled in. You need not work it out to the point of a numerical value for the lower and upper limit. [5 marks]

      Solution: I want confidence intervals for the 10 values of tex2html_wrap_inline476 with i;SPMlt;j. To get simultaneous 95% confidence intervals you divide tex2html_wrap_inline480 by 10 and just work out ordinary 99.5% confidence intervals. The t multiplier is around 2.96. z You also need a standard error for tex2html_wrap_inline484 which is the square root of

      displaymath486

      You estimate tex2html_wrap_inline488 using 66115/44 and get

      displaymath490

    3. Examine the residual plots attached for the three models. Is there anything wrong with our fit? If so suggest what you might try next. Be quite clear. [5 marks]

      Solution: The plots show clear signs of heteroscedasticity; a transformation might be useful. (In fact taking logs is the thing to do.)

    4. I attach a table of regression diagnostics for the fit to model II above. For each diagnostic review the values and comment on whether or not they show any problems and which cases might warrant further examination. [5 marks]

      Solution:

      I just wanted people to compare the various statistics to the guidelines in the text. For the externally studentized residuals I was looking for some mention of the Bonferroni adjustment. Cases 15 and 44 stand out as worth looking at again.

Diagnostics for Model II for Question 3

Ext'ly Ext'ly
Obs tex2html_wrap_inline492 Stud'zed DFFITS Cooks Obs tex2html_wrap_inline494 Stud'zed DFFITS Cooks
# Residual tex2html_wrap_inline496 # Residual tex2html_wrap_inline498
1 0.120 -0.777 -0.287 0.014 26 0.100 -0.096 -0.032 0.000
2 0.108 0.407 0.142 0.003 27 0.100 2.018 0.674 0.071
3 0.129 0.047 0.018 0.000 28 0.158 0.768 0.333 0.019
4 0.103 0.868 0.295 0.015 29 0.104 -0.475 -0.162 0.004
5 0.101 0.141 0.047 0.000 30 0.106 -0.997 -0.343 0.020
6 0.124 -0.377 -0.142 0.003 31 0.102 -1.133 -0.383 0.024
7 0.102 0.681 0.229 0.009 32 0.144 -0.139 -0.057 0.001
8 0.150 -0.578 -0.243 0.010 33 0.154 -0.201 -0.086 0.001
9 0.148 -0.180 -0.075 0.001 34 0.103 1.186 0.401 0.027
10 0.127 -0.261 -0.099 0.002 35 0.137 -0.009 -0.004 0.000
11 0.121 1.073 0.398 0.026 36 0.134 0.607 0.238 0.010
12 0.100 -1.076 -0.359 0.021 37 0.114 0.184 0.066 0.001
13 0.102 -0.179 -0.060 0.001 38 0.101 0.069 0.023 0.000
14 0.130 0.329 0.127 0.003 39 0.109 0.372 0.130 0.003
15 0.106 3.436 1.186 0.188 40 0.101 -0.934 -0.312 0.016
16 0.180 -0.613 -0.288 0.014 41 0.115 -2.130 -0.766 0.091
17 0.104 -0.306 -0.104 0.002 42 0.146 -0.732 -0.303 0.015
18 0.100 0.516 0.172 0.005 43 0.126 1.295 0.491 0.040
19 0.110 -1.138 -0.401 0.027 44 0.101 3.038 1.016 0.145
20 0.110 -1.742 -0.611 0.059 45 0.107 -1.635 -0.565 0.051
21 0.117 0.211 0.076 0.001 46 0.148 1.019 0.425 0.030
22 0.130 0.385 0.149 0.004 47 0.115 0.417 0.150 0.004
23 0.152 -0.699 -0.296 0.015 48 0.103 0.105 0.036 0.000
24 0.111 -0.320 -0.113 0.002 49 0.143 -0.911 -0.372 0.023
25 0.104 -0.715 -0.243 0.010 50 0.142 -0.333 -0.135 0.003




next up previous
Next: About this document

Richard Lockhart
Wed Apr 9 14:40:57 PDT 1997