|
Assignment Answers
The Answers to Assignment #1
Here's the output:
asking price
Mean 56112.34568
Standard Error 1813.565466
Median 52900
Mode 49900
Standard Deviation 16322.08919
Sample Variance 266410595.7
Kurtosis 0.429181538
Skewness 0.781429828
Range 74000
Minimum 25900
Maximum 99900
Sum 4545100
Count 81
Confidence Level(95.0%) 3609.113807
selling price
Mean 51939.44444
Standard Error 1751.837902
Median 48000
Mode 45000
Standard Deviation 15766.54112
Sample Variance 248583818.8
Kurtosis 0.381142789
Skewness 0.797841069
Range 70000
Minimum 25000
Maximum 95000
Sum 4207095
Count 81
Confidence Level(95.0%) 3486.271919
METHOD 1
Doing the F-test Manually
using the output from the
Descriptive Statistics tool
F= 266410595.7
248583818.8
= 1.072
METHOD 2
F-Test Two-Sample for
Variances
asking price selling price
Mean 56112.34568 51939.44444
Variance 266410595.7 248583818.8
Observations 81 81
df 80 80
F 1.07171
P(F<=f) one-tail 0.37876
F Critical one-tail 1.44773
The null hypothesis states that the two variances are infact identical. Given that the observed F-stat had a P-value of 37.876%, we can not reject the null hypothesis at a 5, or even 10% level of significance. Therefore, we conclude that, based upon the statistical evidence, we can accept the null hypothesis that the variance of the selling prices is the same as the variance of the asking prices.
The Answers to Assignment #2
Before you actually run any regressions, it is always a good idea to see exactly what data looks like. Following section 14.1 of the text book, you should have been able to produce the following chart.
The graph and regressed trend-line do not give you very much information, however. In order to get more detailed information, you will have to run a full regression. The output is as follows:
SUMMARY
OUTPUT
Regression
Statistics
Multiple R 0.984629521
R Square 0.969495294
Adjusted R 0.969109159
Square
Standard 2868.736216
Error
Observations 81
ANOVA
df SS MS
Regression 1 20662705504 2.0663E+10
Residual 79 650142150.5 8229647.47
Total 80 21312847654
F Significance
F
Regression 2510.764351 1.23254E-61
Residual
Total
Coefficients Standard t Stat
Error
Intercept 3169.232921 1103.622675 2.87166347
selling price 1.019323817 0.020342728 50.1075279
P-value Lower 95% Upper 95% Lower Upper 95.0%
95.0%
0.005239803 972.5250795 5365.940762 972.525079 5365.940762
1.23254E-61 0.978832595 1.059815039 0.9788326 1.059815039
RESIDUAL
OUTPUT
Observation Predicted Residuals Standard
asking price Residuals
1 45980.83323 -1080.83323 -0.3767628
2 42413.19987 -1413.199871 -0.4926211
3 53625.76186 -725.7618568 -0.2529901
4 64838.32384 -2338.323843 -0.8151059
5 65347.98575 -347.985751 -0.1213028
6 70444.60484 -544.6048354 -0.1898414
7 70444.60484 2455.395165 0.85591528
8 73502.57629 -602.5762861 -0.2100494
9 88588.56878 -2688.568776 -0.9371962
10 93379.39072 120.6092846 0.04204265
11 94908.37644 4991.623559 1.74000786
12 34258.60934 -2358.609336 -0.8221771
13 30690.97598 -790.9759769 -0.2757228
14 39864.89033 -1964.890329 -0.6849324
15 40884.21415 -984.2141457 -0.3430828
16 41801.60558 -1901.605581 -0.6628722
17 44451.8475 -1551.847505 -0.5409516
18 44706.67846 10193.32154 3.55324463
19 49038.80468 861.1953192 0.30020025
20 49038.80468 861.1953192 0.30020025
21 52606.43804 -5706.43804 -1.9891819
22 70342.67245 1557.327546 0.54286188
23 41801.60558 -1901.605581 -0.6628722
24 41903.53796 -3.537962625 -0.0012333
25 45980.83323 -3080.83323 -1.073934
26 46286.63038 -2386.630375 -0.8319449
27 47509.81896 -1609.818956 -0.5611596
28 48019.48086 -2019.480864 -0.7039619
29 48529.14277 1370.857228 0.47786102
30 48936.8723 63.12770086 0.0220054
31 49038.80468 3861.195319 1.34595691
32 52096.77613 5403.223869 1.88348578
33 53116.09995 -3216.099948 -1.121086
34 55664.40949 3235.590509 1.12788011
35 65347.98575 -347.985751 -0.1213028
36 65347.98575 -1447.985751 -0.5047469
37 66061.51242 -3161.512423 -1.1020576
38 71463.92865 5436.071348 1.89493594
39 86753.78591 146.2140944 0.05096812
40 58722.38094 -2222.380941 -0.7746899
41 70954.26674 545.7332561 0.19023473
42 74521.9001 -4621.900103 -1.6111276
43 86753.78591 -3853.785906 -1.3433741
44 96947.02407 52.97592551 0.01846664
45 41903.53796 -2003.537963 -0.6984044
46 48019.48086 -3519.480864 -1.2268402
47 44961.50941 -1061.509413 -0.3700268
48 47000.15705 899.8429529 0.31367225
49 52096.77613 2403.223869 0.83772912
50 59028.17809 -1028.178086 -0.358408
51 59232.04285 667.9571503 0.23284021
52 61270.69048 2229.309517 0.77710509
53 72483.25247 2416.747531 0.84244327
54 100004.9955 -104.9955251 -0.0365999
55 41903.53796 7996.462037 2.78745114
56 28652.32834 -2752.328343 -0.9594219
57 28902.06268 797.9373218 0.27814942
58 60251.36667 4648.633333 1.62044642
59 39355.22842 -855.2284204 -0.2981203
60 33748.94743 6151.052572 2.14416806
61 52096.77613 -2296.776131 -0.800623
62 52096.77613 -196.7761315 -0.0685933
63 56174.0714 -1274.071399 -0.4441229
64 58212.71903 -2312.719033 -0.8061804
65 60251.36667 -351.3666666 -0.1224813
66 64328.66193 671.3380659 0.23401875
67 64328.66193 571.3380659 0.1991602
68 71463.92865 -1563.928652 -0.5451629
69 51077.45231 3822.547685 1.3324849
70 74521.9001 4378.099897 1.52614237
71 49038.80468 861.1953192 0.30020025
72 55154.74758 745.2524179 0.25978423
73 36297.25697 2202.74303 0.7678444
74 37826.2427 -2326.242695 -0.8108946
75 38845.56651 3154.433488 1.09958994
76 39864.89033 35.10967115 0.01223872
77 49038.80468 861.1953192 0.30020025
78 51077.45231 -3177.452315 -1.107614
79 59232.04285 -2332.04285 -0.8129164
80 68304.02482 -3404.02482 -1.1865939
81 54135.42377 -1235.423765 -0.4306509
The following two graphs are also part of the regression output:
The equation you were looking for is
The Answers to Assignment #3Your output should have looked something like this:
SUMMARY
OUTPUT
Regression
Statistics
Multiple R 0.985265572
R Square 0.970748247
Adjusted R 0.969608568
Square
Standard 2748.602963
Error
Observations 81
ANOVA
df SS MS
Regression 3 19304984495 6434994832
Residual 77 581721005.1 7554818.247
Total 80 19886705500
ANOVA Cont.
F Significance F
Regression 851.7736127 6.16519E-59
Residual
Total
The Regression Output
Coefficients Standard t Stat
Error
Intercept -809.4898084 1213.059803 -0.667312367
asking price 0.939904447 0.024139085 38.93703723
days on sale -17.60678093 9.811878374 -1.794435301
lot size 0.217499996 0.282492101 0.76993302
P-value Lower 95% Upper 95%
Intercept 0.506567616 -3225.003385 1606.023768
asking price 2.04821E-52 0.89183733 0.987971564
days on sale 0.076668613 -37.14475041 1.931188559
lot size 0.443695326 -0.345014319 0.780014312
Lower 95.0% Upper 95.0%
Intercept -3225.003385 1606.023768
asking price 0.89183733 0.987971564
days on sale -37.14475041 1.931188559
lot size -0.345014319 0.780014312
RESIDUAL
OUTPUT
Observation Predicted selling Residuals Standard
price Residuals
1 40798.27607 1201.723933 0.437212631
2 37909.92805 590.0719478 0.214680678
3 48943.16132 556.8386812 0.202589711
4 57919.55029 2580.449705 0.938822282
5 60452.73135 547.2686537 0.199107933
6 65019.93267 980.0673319 0.356569263
7 67866.77992 -1866.779915 -0.67917409
8 68403.063 596.937001 0.21717833
9 80725.80725 3074.192753 1.118456465
10 86859.2835 1640.716499 0.596927428
11 93050.74737 -3050.747367 -1.109926536
12 28523.06025 1976.939745 0.71925257
etc.....etc..........
InterpretationAll students should at least have been able to generate the following equation from this
However, it doesn't have to end there, and when you do Assignments 4 and 5, you will want to do further analysis: The t-tests on the Constant and Lot do not look very good. At a 10% level of significance, we would accept the null hypothesis that the coefficient was equal to zero for both at a 10% level of significance.
There are three things to note here:
Important Note: When it comes to doing the project, you will realize that it isn't quite so easy to drop a variable, as we have done here with Lot size. The reason for this is fairly simple. When you conduct more thorough investigations of your data, as you are expected to do if your project, you often find that you can no longer trust your t-tests and f-tests. Why is this? T and F tests are no loner valid when the errors are not independently and idtentically distributed according to a normal distribution with a mean of zero { this is often shortened to IID~N(0) }. When you have Heteroscedasticity or autocorrelation, the errors are nolonger independently and idtentically distributed according to a normal distribution with a mean of zero. T tests also cease to be beliveable when there is serious multicollinearity in the data. Almost all projects have at least one of these problems. ( Hetero, Auto, Multi ) Thus, as you can now see, in the real world, it isn't quite so easy to drop a variable.
The Answers to Assignment #4Here is the output that I generated for this Assignment. Note that there are many ways of going about this.
Regression
Statistics
Multiple R 0.7423494
R Square 0.5510826
Adjusted R 0.4228205
Square
Standard 0.1756333
Error
Observations 10
ANOVA
df SS MS
Regression 2 0.26507072 0.132535
Residual 7 0.21592928 0.030847
Total 9 0.481
ANOVA Cont.
F Significance
F
Regression 4.2965341 0.060615349
Residual
Total
The Regression Results
Coefficient Standard t Stat
s Error
Intercept -2.969425 3.436814041 -0.864005
x1 -0.00447 0.001549131 -2.88535
x2 0.2187938 0.083909076 2.607511
P-value Lower 95% Upper 95%
Intercept 0.4162061 -11.09619346 5.157343
x1 0.0234716 -0.008132894 -0.000807
x2 0.03504 0.020380542 0.417207
Lower Upper 95.0%
95.0%
Intercept -11.09619 5.157342571
x1 -0.008133 -0.000806674
x2 0.0203805 0.417207129
RESIDUAL
OUTPUT
Observation Predicted Residuals Standard
y Residuals
1 5.2404297 -0.040429737 -0.230194
2 5.3733586 -0.073358605 -0.417681
3 5.6410945 -0.241094485 -1.372715
4 5.4752429 0.124757057 0.710327
5 5.3969089 0.103091065 0.586968
6 5.4105622 0.289437755 1.647967
7 5.4206848 0.079315157 0.451595
8 5.3981103 0.001889654 0.010759
9 5.3710661 -0.171066064 -0.973996
10 4.9725418 -0.072541798 -0.41303
From running the basic regression, you should have been able to derive the following:
The estimated equation was
Note : "E-02" means that the decimal place needs to be moved two places to the left, hence -.44698E-02 becomes -0.0044698.
The R-squared is 0.5511 and the Adjusted R-squared is 0.4228, which aren't bad fits. Note: You should have included literal interpretions of each of the coefficients, so that you might say:
A one unit increase in X1 will result in a 0.0044698 decease in Y. You should have also included a brief discussion of the relevant t-tests.
The T-stat for the coefficient on X1 was -2.885, which corresponds to a two-tailed p-value of (2 X 0.012). This means that at 5% level of significance, we would reject the null hypothesis that the coefficient on X1 was equal to zero.
Some Notes:
Assignment #4 - Some Advanced ExamplesThis section gives an idea of some more advanced testing that you could have included in this assignment. You did not have to do all of these for this assignment. I have included them in this answer set as examples to help you with assignment #5 and with the final project.
Section 18.2 of the book shows you how to do a Durbin Watson test for Autocorrelation.
Durbin Watson
Stat
1.333901624
You can also do a visual check for autocorrelation by plotting the errors over time, or else plotting the errors in time (t) against the errors in time (t-1), or in other words, by plotting errors against the lagged errors. The following sections do this. Note that because these are rough checks, we are not interested in the equation for the fitted line or these plots. Instead, we look to the Durbin Watson test for an exact result.
This data was
used to make
the first
Check for
Auto - Graph
Observation Observed y Lagged Residuals
Residuals
2 5.3 -0.040429737 -0.073359
3 5.4 -0.073358605 -0.241094
4 5.6 -0.241094485 0.124757
5 5.5 0.124757057 0.103091
6 5.7 0.103091065 0.289438
7 5.5 0.289437755 0.079315
8 5.4 0.079315157 0.00189
9 5.2 0.001889654 -0.171066
10 4.9 -0.171066064 -0.072542
I cut and pasted to produce the data layout above. I then used the data to create the following chart:
It seems that there is a rough correlation.
This data was
used for the
second Check
for Auto
Observation Residuals
1 -0.04043
2 -0.073359
3 -0.241094
4 0.1247571
5 0.1030911
6 0.2894378
7 0.0793152
8 0.0018897
9 -0.171066
10 -0.072542
Again, I cut and pasted to produce the data layout above. I then used the data to create the following chart:
Again, there seems to be a rough correlation. The diagram above is perhaps a little more intuatively appealling than the first. The red line has been inserted just for illustrative purposes. I used a fifth degree polynomial. As you can see, there is a pattern to the errors. Three negatives in a row, followed by five positives followed by two negatives indicates that, if there is a positive error in this period, we are likely to see a positive error in next period, and if there is a negative error in this period, then we are likely to see a negative error in next period. One of the assumptions behind OLS states that the errors are identically and independentally distributed. This means that there is no relationship between the error in period ( t ) and the error in period ( t-1 ). The graph above shows that the assumption of no autocorrelation appears to be violated because there appears to be a relationship between the error in period ( t ) and the error in period ( t-1 ).
Checking For Heteroscedasticity
Observation Observed y Squared
Residuals
1 5.2 0.001634564
2 5.3 0.005381485
3 5.4 0.058126551
4 5.6 0.015564323
5 5.5 0.010627768
6 5.7 0.083774214
7 5.5 0.006290894
8 5.4 3.57079E-06
9 5.2 0.029263598
10 4.9 0.005262313
The data above was used to generate the following chart:
Some Notes on Interpreting the Fancy Extras
Autocorrelation
From the evidence presented above, it is not clear whether there is autocorrelation in the errors.
Because of the small sampel size, the Durbin Watson statistic is unreliable. You will not have this problem when you do you project. As it stands, the DW stat 1.333901624 is between the lower and upper bounds, so a clear desicion is not possible.
The graphs indicate that there might be some autocorrelation. For further investigation, a runs test might be a good idea.
However, graphs are not always reliable.
Heteroscedasticity
The check for Heteroscedasticity is good example of this.
An LM test indicates that there is no Heteroscedasticity. The LM stat is N * R-squared from regressing the squared errors on a vraiable(s) thought cause the Heteroscedasticity and is distributed Chi-squared with 1 degree of freedom.
Ask your TA for more information on the LM stat.
What if there had been clear evidence of auto or hetero?
If there is problems with autocorrelation or Heteroscedasticity, the T and F-tests become unreliable. It is important that you remember this fact when you do your project.
When you do your project, you will have to do many T and F-tests. You will definitely want to report the results of these tests, but if you find evidence of Hetroskedasticity or Autocorrelation, you may want to view these results with caution.
Why is this?
T and F tests are no loner valid when the errors are not independently and identically distributed according to a normal distribution with a mean of zero { this is often shortened to IID~N(0) }. When you have Heteroscedasticity or autocorrelation, the errors are nolonger independently and idtentically distributed according to a normal distribution with a mean of zero.
But that still doesn't explain it....why are the tests no longer valid?"
In order to better understand this, first remember that at the limit, the T-test is just like the Z-test. Remember, also, that the Z-test is based upon the normal distribution. The T and F-tests come up with statistics that are checked against tables that are based upon combinations and permutations of the normal distribution.
Autocorrelation and Heteroscedasticity mean that the error can no longer be thought of as being normally distributed, or at least not normally distributed with a constant mean and variance.
The T and F - tests that are based upon the assumption that the errors are normally distributed.
If the errors are not actually normally distributed, the T and F-tests do not work.
"If I can't believe the T and F-tests, what should I do?"
This is an important question.
When you create the model, economic theory told you to inclue certain variables. After you run your model, the statistical evidence may indicate that some the variables are irrelevent. For instance, a T-test might indicate that a particular variable is insignificant. Or, an F-test may indicate that a couple of variables are jointly insignificant.
Based on the t and f-tests alone, you would deceide to get rid of the variables.
However, getting rid of a variable isn't that easy.
Before you get rid of the variables, you first have to check to see if the T and F-tests are valid.
If there is autocorrelation or Heteroscedasticity, then the T and F-tests are not valid.
If the T and F - tests are not valid, you can not get rid of the seemingly "insignificant" variables because you do not know whether the insignificant result is due the variable being irrelevent to the model, or because of problems with the T and F tests
The Answers to Assignment #5
SUMMARY OUTPUT
Regression Statistics
Multiple R 0.514412452
R Square 0.26462017
Adjusted R Square 0.089529735
Standard Error 0.541591981
Observations 27
ANOVA
df SS MS
Regression 5 2.216536953 0.443307391
Residual 21 6.159759 0.293321873
Total 26 8.376296296
ANOVA
F Significance F
Regression 1.51133424 0.228794792
Residual
Total
Coefficients Standard Error t Stat
Intercept 1.996841973 1.272797329 1.568860908
X1 - # of hours studying 0.009900721 0.016536726 0.598711065
X2 # of hours studying for 0.076292741 0.056537054 1.349429003
tests
X3 - # of hours spent in -0.136520264 0.069221501 -1.972223384
bars
X4 -Highlight Text ? 1-Yes 0.063590148 0.260623817 0.243992081
X5 - avg # of credit hrs 0.137937105 0.075212915 1.833955048
per term
P-value Lower 95% Upper 95%
Intercept 0.131626283 -0.650085432 4.643769379
X1 - # of hours studying 0.555769379 -0.024489289 0.044290731
X2 # of hours studying for 0.19156885 -0.04128252 0.193868002
tests
X3 - # of hours spent in 0.061897232 -0.280474281 0.007433754
bars
X4 -Highlight Text ? 1-Yes 0.809604879 -0.478406845 0.60558714
X5 - avg # of credit hrs 0.080872162 -0.018476741 0.294350951
per term
Lower 95.0% Upper 95.0%
Intercept -0.650085432 4.643769379
X1 - # of hours studying -0.024489289 0.044290731
X2 # of hours studying for -0.04128252 0.193868002
tests
X3 - # of hours spent in -0.280474281 0.007433754
bars
X4 -Highlight Text ? 1-Yes -0.478406845 0.60558714
X5 - avg # of credit hrs -0.018476741 0.294350951
per term
Interpretation of Output for Assignment #5From running the basic regression, you should have been able to derive the following:
The estimated equation was
The R-squared is 0.2646 and the Adjusted R-squared is 0.0895, which aren't great fits. You should have included literal interpretions of each of the coefficients, so that you might say:
A one unit increase in X1 will result in a 0.0099 incease in Y. etc. A special interprutation for the dummy variable X4 is needed. Students who highlight or made notes as they read their texts could expect to see a 0.1379 increase in their GPA. Thus if they spent no time studying and didn't go to bars, their expected GPA would be 1.99684 + 0.1379. You should have also included a brief discussion of the relevant t-tests.
Based on the T-tests alone, X1, X2 and X5 are individually insignificant. If you are confused about how the P-values work, then just use the calculated t-stats as your guide.
An F-test was then calculated to see if the variables were jointly insignificant.
F = ( Restricted SSE -Unrestricted SSE) / K1
Unrestricted SSE / (n - K - 1)
F = (6.960725388 -6.15975934309105 ) / 2
6.15975934309105 / ( 27 - 2 -1 )
F = 1.560384
The information for this F-test was obtained by running a second restricted version of the regression. The F-stat indicated that X1, X2 and X5 are jointly insignificant. Tests for Autocorrelation and Heteroscedasticity were also run. The Durbin Watson stat indicated that first order autocorrelation was not present. The LM stat from a regression of the squared residuals on the depedent variable indicated that Heteroscedasticity in Y was not present. Given that there was no evidence of autocorrelation or heteroscedasticity, both the T and F tests appear to be reliable and thus X1, X2 and X5 can be dropped from the model. Your GPA is not effected by number of hours you study per week, the number of hours you study for each test, or the number of courses you take. Your GPA is, however, negatively influenced by the number of hours you spend in bars and positively influenced by the Highlighting as you read. So... highlight these notes, stop studying, and only go to the bar if you plan on quickly buying your TA a beer and then leaving.
|