BUEC 333 Spring 1991 D. Maki FINAL EXAMINATION - B IMPORTANT - record on the top front of your answer sheet the letter "A", "B" or "C" from the examination title above. This examination consists of 30 multiple choice questions. Choose the letter corresponding to the one best answer to each question. Your grade will be computed on the basis of the number of correct answers. The exams will be machine graded, and making the answers legible to the machine is your responsibility. Use soft pencil (HB or softer) only. Fill in the appropriate circles corresponding to your name on the back of the answer sheet. This is a "closed book" examination. A formula sheet and tables are attached. Total time allowed = 3 hours. Questions 1 and 2 refer to the following problem setting: A regression equation is calculated using 10 observations as: Yhat = 1.1 + 0.06X Other estimates obtained are: MSE = 1.21, and the estimated variance of X = 1111.1. 1. A 95 per cent confidence interval for Beta, the slope of the regression equation, is: a. .06 plus or minus .140 b. .06 plus or minus .025 c. .06 plus or minus .011 d. .06 plus or minus 2.540 e. impossible to calculate without more information 2. The R-square value for this equation is: a. .50 b. .79 c. .86 d. .91 e. impossible to calculate without more information 3. One of the assumptions of the Classical Linear Regression Model is that the true error terms are not correlated with each other. Violation of this assumption causes: a. the parameter estimates to become biased b. the parameter estimates to become nonlinear c. the usual inference statements to be misleading d. a problem termed heteroscedasticity e. the parameters to become non-estimable 4. For the simple regression model, if we calculate: b-squared*[sum(X - Xbar)-squared]/[sum(Y - Ybar)-squared] the result is: a. SSR b. SSE c. R2 d. SSE/SSR e. none of the above 5. Given the following graphical relationship involving four observations on two variables, Y and X, if the scales on the two axes are the same: Y | x | | x x | | x |_____________ X The sample regression line Yhat = a + bX will be: a. a = 0 and b < 1 b. a > 0 and b = 1 c. a = 0 and b = 1 d. a >0 and b < 1 e. none of the above 6. The least squares estimators for simple regression, a and b, are so named because: a. the sum (a-squared + b-squared) is smallest among all such sums which arise by fitting straight lines b. there are fewer squared terms in the formulas for least squares estimators than in the formulas for other estimators which fit straight lines c. the sum of (Xi+Yi)-squared is smallest among all such sums which arise by fitting straight lines d. the sum of the squared deviations in the x-direction is minimized e. none of the above 7. A measure of the variability around a fitted regression line is given by: a. the sum of squares total (SST) b. the sum of squares regression (SSR) c. the sum of squares error (SSE) d. the regression mean square (MSR) e. none of the above 8. In a particular simple linear regression study with 15 observations, the following sums of squares of obtained: SST = 60,000 and SSE = 20,000. Hence, MSR = __________ and MSE = __________: a. 40,000; 10,000 b. 40,000; 1,429 c. 20,000; 10,000 d. 20,000; 1,429 e. none of the above 9. It is assumed that the number of videos a person has rented in the last year (X) is a useful predictor of the amount a person paid for their VCR (Y). A sample of 5 observations yields: Y X ___ ___ 350 10 250 8 300 3 400 20 200 2 The estimated regression function is: a. Y = 221 + 9.2 X b. Y = 218 + 10.6X c. Y = 235 + 7.6X d. Y = 300 + 8.6X e. Y = 200 + 11.6X 10. For which of the following questions is an estimate of a mean value as opposed to the prediction of an individual response required: (1) on average, do men aged 50 or men aged 40 carry larger amounts of life insurance? (2) what is the typical cost of maintenance for a three-year old delivery van? (3) what is the average amount of raw material wasted with production runs of 100 units? a. 1 only b. 1 and 3 only c. 2 and 3 only d. 1, 2 and 3 e. none of the above 11. The estimators, a and b, of the coefficients in simple regression models are: a. independent of each other b. negatively correlated c. positively correlated d. negatively correlated only if b is negative e. positively correlated only if both are positive 12. In fitting a simple linear regression model to a data set with 202 observations, suppose we obtain: a = 15, b = -.5, MSE = 4, and SST = 1000. The value of the simple correlation between X and Y, r(XY), is: a. .45 b. .50 c. .55 d. .60 e. impossible to calculate without more information Questions 13, 14 and 15 refer to the following problem setting: A study was undertaken at a university to examine the relationship between first-year performance in the school's MBA program measured by GPA (Y), the student's undergraduate GPA (X1) and the student's score on the GMAT entrance exam (X2). The results of estimating a multiple regression model using data on 12 students were: Yhat = -2.339 + .468X1 +.0077X2 Sb1 = .229 Sb2 = .0018 SST = 3.090 SSR = 2.963 SSE = .127 13. The test statistic for testing the hypothesis that b1 and b2 are simultaneously zero is: a. .959 b. 5.27 c. 23.3 d. 105.0 e. 127.2 14. Using a point estimate, what is the predicted first- year GPA for a student whose undergraduate GPA was 3.25 and who received a score of 520 on the GMAT? a. 3.186 b. 3.250 c. 3.860 d. 7.864 e. none of the above 15. What statement is appropriate regarding the contribution of X2 to this model, using significance level .01? a. there is no evidence X2 significantly reduces SSE, given that X1 is already in the model b. there is no evidence X2 significantly reduces SSE, irrespective of whether X1 is in the model c. X2 makes no significant contribution due to high collinearity with X1 d. insufficient information is given to reach a conclusion regarding the contribution of X2 e. none of the above 16. Given the time series: Year Data 1987 100 1988 105 1989 107 Using a Holt-Winters nonseasonal model with A = .5 and B = .9, the forecast value for 1990 is: a. 107.46 b. 108.50 c. 112.00 d. 113.35 e. none of the above 17. Given the following data on sales of china dinner plates by a company in 1980 and 1990: Type Price Quantity 1980 1990 1980 1990 Regular 5.00 6.00 10 14 Bone 7.00 9.00 6 10 A Laspeyres price index for 1990 using a 1980 base is: a. 123.9 b. 124.3 c. 152.2 d. 152.6 e. 152.8 18. Given the following data on Y, X1 and X2: Y X1 X2 ____ ____ ____ 5 1 3 -3 1 -1 3 -1 -3 -5 -1 1 The estimated coefficient b1 for the regression equation Y = a + b1*X1 + b2*X2 + e is: a. 1/4 b. 1/2 c. 1 d. 1 1/4 e. 1 1/2 Questions 19 and 20 refer to the following problem setting: An equation was estimated where profit level (Y, in $ thousands) was a function of sales revenue (X, in $ thousands) using a sample of 10 stores. The data are: Store Y X Store Y X _____ ___ ___ _____ ___ ___ 1 20 305 6 27 269 2 15 130 7 35 421 3 17 189 8 7 195 4 9 175 9 22 282 5 16 101 10 23 203 The estimated equation was Y = 3.88048 + .06705 X. It is also given that the mean of X is 227, the mean of Y is 19.1, MSE = 32.668 and the sum of (X - Xbar)-squared = 79542. 19. An eleventh store selected from the same population had sales revenue = $250 thousand last month. The 95 per cent prediction interval for this store's profit level (in $ thousands) is: a. 6.78 < Y < 34.51 b. 8.86 < Y < 32.43 c. 14.63 < Y < 26.66 d. 6.83 < Y < 34.45 e. 7.30 < Y < 33.98 20. Another store selected from this population had sales revenues of $900 thousand last month. Which of the following statements regarding a 95 per cent prediction interval for this store's profit level last month is most justified? a. the prediction interval would be narrower than that for a store with sales revenue of $250 thousand last month b. the prediction interval would need to be interpreted with caution unless it is known that the function is linear over the entire possible range of X c. such a prediction interval could not be constructed from the data given d. these prediction limits would be centered around the value $900 thousand e. the prediction interval would be narrower than that for the mean of all stores with revenues of $900 thousand 21. The seasonal index for the third quarter in a time series of sales is 95.0. Sales in the third quarter of last year were 127,500 units. Thus, seasonally adjusted sales for the third quarter of last year were 134,211 units. Assuming a multiplicative model, the value 134,211 here represents what component(s) of the time series? a. trend and cyclical only b. cyclical and irregular only c. trend, cyclical and irregular only d. seasonal only e. trend, cyclical, seasonal and irregular 22. Testing for heteroscedasticity by regressing the squares of the observed residuals on the prediction values (Yhats) will detect: a. all forms of heteroscedasticty b. only cases where the variance of e increases with Y c. cases where the variance of e increases or decreases with Y d. cases where the variance of e first increases with Y, and then decreases again e. cases where the variance of e first decreases with Y, and then increases again 23. In estimating a nonlinear relationship between two variables, Y and X, using a quadratic polynomial, i.e. Y = a + b1*X + b2*X2 + e a. no possible combination of coefficient values can produce a function which increases at a decreasing rate b. no possible combination of coefficient values can produce a function which increases at a decreasing rate c. no possible combination of coefficient values can produce a function which decreases at an increasing rate d. no possible combination of coefficient values can produce a function which decreases at a decreasing rate e. none of the above 24. Assume one estimates the following two regression equations, where e1 and e2 are the calculated residuals. Y = a1 + b1*X1 + e1 Y = a2 + b2*X2 + e2 Then the simple correlation between e1 and e2 is equal to: a. the partial correlation between Y and X1, given X2 b. the partial correlation between Y and X2, given X1 c. the partial correlation between X1 and X2, given Y d. the R2 from the multiple regression of Y on X1 and X2 e. none of the above 25. In the simple linear regression model, if the 95% confidence interval for Beta includes zero, one can conclude that, at the .05 level: a. there is no linear statistical relationship between X and Y b. there is no statistical relationship, linear or nonlinear, between X and Y c. there is no linear effect of X on Y; there may be an effect of Y on X d. the regression passes through the origin e. none of the above 26. Multicollinearity refers to a condition where: a. several independent variables can be used to predict the value of Y b. the observations of the independent variables are highly correlated c. more than one Y variable needs to be predicted from the same set of independent variables d. more than one dummy variable is needed to represent a qualitative factor with several levels e. none of the above 27. Given the following hypothetical data on an index of industrial production: Year IIP (1971=100) IIP (1981=100) 1978 135.8 1979 138.6 1980 140.9 1981 146.3 100.0 1982 107.2 The value of the IIP (1971=100) for 1982 is: a. 73.3 b. 100.0 c. 136.5 d. 156.8 e. none of the above 28. Consider the following two regression equations: Y-hat = a1 + b1*X X-hat = a2 + b2*Y Then, if r(XY) is the correlation between X and Y: a. b1 times b2 equals r(XY) b. b1 = 1/b2 c. if b1 is positive, b2 will be negative, and vice versa d. b1 = b2 e. none of the above 29. Assume a time series trend equation is estimated as: Yt = 120 + 2Xt X unit = 1 quarter; X=0 in 1985, 1st quarter and a seasonal index takes on values of 95.2, 100.1, 107.5 and 97.2 for quarters 1 through 4, respectively. Then the predicted value of Yt for the first quarter of 1991, incorporating both trend and seasonal variation, is: a. 158.0 b. 159.9 c. 172.8 d. 174.4 e. 178.6 30. A researcher hypothesized that household saving (Y) as a function of household income (X) might vary among the three regions of Canada (East, Central and West). Using a sample of data on individual households, this researcher plans to estimate the equation: Y = a + b1*X + b2*D1 + b3*D2 + b4*D3 + e where D1 = 1 for households in the East, and zero otherwise; D2 = 1 for households in the Central region, and zero otherwise; and D3 = 1 for households in the West, and zero otherwise. The main problem with this estimation is: a. if income varies by region, the dummies might be correlated with X b. some people may recently have moved to a different region c. it is difficult to decide if Manitoba is Central or West d. it will be impossible to calculate coefficient values e. there is no serious problem in estimating this equation