ECON 836

Critiques 10%

Replications 30%

Midterm 20%

Final 40%

You get graded based directly on your replications, and also indirectly through the exams, which will have code- and paper-based questions based directly on the replications that you have undertaken. As you can see, the purpose of this course is to get you to do econometrics, via the replications.

Assignments

This course is based on critiques and replications. A critique is a short (2-page) referee report in which you describe the strengths and shortcomings of a paper, and suggest ways in which to improve the paper. It is a critical review of a paper, and your critiques should focus on empirical strengths, shortcomings and improvements. Critiques should not exceed 2 pages, single-spaced, 12 point, Times New Roman. Bullet points are fine.

A Replication is an exercise in which you attempt to repeat the empirical exercise undertaken by the author of the paper. Typically, you will only replicate a small number of lines in a table of results. A good start is to do the stata tutorials linked below. All requests for Stata help should be preceded by you googling "help stata whatever" and typing "help whatever" in Stata. In Stata, the help menus have examples at the bottom. Good starts here "help use", "help recode", "help keep" and "help regress".

There are 3 supplementary textbooks for this course:

1. Green, William, Econometric Analysis, Prentice Hall, 5th/6th Edition, 2008.

2. Kennedy, Peter, A Guide to Econometrics, 5th or 6th Edition (Paperback).

3. Angrist, Joshua and Jorn-Steffen Pischke, Mostly Harmless Econometrics: An Empiricist's Companion (Paperback).

I will try to let you know where to look for relevant supporting material in these textbooks. In addition, I will post links to other material you may find helpful.

Lecture Notes to Introduce Ordinary Least Squares.

Useful links for learning Stata are at UCLA economics Stata Tutorials .

A nice introduction to quantile regression is in Koenker and Hallock.

Assignment 1a is due in-class Wednesday 13 Jan

Read and critique Pendakur and Pendakur, 2011, Aboriginal Incomes.

Write a 2-page critical review of the paper, focusing on empirical strengths, shortcomings and improvements. Do the authors try to do something interesting? Do they succeed? What would you have done?

Assignment 1b is due in-class Wednesday 20 Jan

This assignment is a first crack at uncovering a difference in conditional means using real data. It is meant to allow you to get your hands dirty with ugly, badly coded, data with lots of missings and weird codes. Your main objective is to get the right sample, the right covariates, the right coding and then go regress y x. Start working on this assignment today (whatever today is).

For this assignment use the 2006 Census Public Use individual-level microdata, and its documentation. Replicate, as best as you can, the rightmost columns of Table 2 of the paper. Note that you do not have all of the variables and observations in the confidential main base used by the authors, so you will have to do your best with the data at hand. Please include your stata code and relevant output (not all the output, just what is important) and your table of results. Please note that you will not get exactly the same numbers because you don't have exactly the same data. Also your data are coded a bit differently and so you will have to use your judgment as to how to replicate that Table to the best of your ability. Hints: 1) try to get the right sample; 2) try to get the right covariates; 3) try to code each variable correctly.

Here is some sample code.

Lecture Notes on Panels.

Lecture Notes on Non-Spherical Errors.

Assignment 2a due in-class Wednesday 27 Jan

Read and critique Allen, Pendakur and Suen 2005. "NO-FAULT DIVORCE AND THE COMPRESSION OF MARRIAGE AGES", Economic Inquiry 2005. Write a 2-page critical review of the paper, focusing on empirical strengths, shortcomings and improvements. In your report, consider carefully: do the authors really measure marital search intensity? If not, what do they measure? Is it an interesting magnitude to measure? Are you convinced by their empirical claims? How could they do better?

Assignment 2b due in-class Wednesday 3 Feb

Using the data on brides' age at first marriage for US states 1970 to 1995 microdata, replicate the results using just the first no-fault definition for brides in Table 6 and Figure 3. In addition, consider the possibility that the response of the marriage age distribution is not instantaneous, and occurs slowly over time. Estimate a model that corresponds to this structure, and report results corresponding to Tables 6 for this model.

These data are given at the state-year level. "nf1" gives the basic no-fault indicator. This replication will use the regress command in Stata, with weights a nd robust standard errors. Consider whether or not the authors should have clustered their standard errors and respond accordingly. "Count" gives the number of observations used to compute each datum in each state-year, and so should be used to weight all regressions (we'll learn about this later). Use hetero-robust standard errors with the ",r" subcommand (we'll learn about this later). Other variables give statistics about the distribution of the age at first marriage, computed at the state-year level. "p_**" gives the ** percentile of age at first marriage.

More on Venn Diagrams for Regression, Peter Kennedy, 2002. This paper presents the Ballentine Diagrams discussed in class, relating to Multicollinearity and Endogeneity.

Assignment 3a due in-class Monday 15 Feb

Lecture Notes on Seemingly Unrelated Regression.

Midterm

The midterm is Monday 22 Feb, 2 hours, in-class, closed book.

You might focus attention on the following Chapters of Greene, Kennedy and Angrist and Pisschke:

Sources for OLS: Greene, 5th Ed, Chapters 1-3; Kennedy, 5th Ed, Chapters 1-3; Angrist-Pischke Ch 1-3.

Sources for GLS, Heteroskedasticity, Panel Methods: Greene, Chapters 10-13: Kennedy, 5th Ed, Chapters 8, 14, 17, Appendix B; Angrist-Pischke Ch5 (for panel s tuff)

Sources for Endogeneity (and also SUR): Green, Chapters 14-15; Kennedy, 5th Ed, Chapters 9, 10; Angrist-Pischke Ch4.

Study Questions for the Midterm. You may also be interested in the midterm for 2010 and 2012 (I seem\ to have misplaced 2011). Grading keys are 2010 and 2012. The midterm will be comprised of 3-4 questions from the study questions, and 3-4 other questions. I recommend that you work on the study questions in groups. You should NOT delegate study questions and then report back; rather you should work on them together, so that you can understand them better.
Lecture Notes on Endogeneity.

Survey Paper on Identification: The Identification Zoo.

Assignment 4a due in-class Wednesday 9 March

Read and critique the methodology proposed in Pendakur 2010, EASI Made Easier. Discuss at least: what is the meaning of error terms in this model; is the structure on error terms plausible; why is there endogeneity in this model; is the approach to dealing with endogeneity plausible?;

Replication, due at the beginning of class Thursday 28 March

Assignment 4b due in-class Wednesday 23 March

Use this zipped datafile, which contains the complete Surveys of Household Spending from 1997 to 2009. The complete documentation is here. In the data, the labeling conventions are: s* are spending categories 1-13 in nominal dollars; p* are the natural logs of prices for goods 1-13, which vary across province and year only, and are all equal to 0 in Ontario in 2002. Note that good 3 is shelter, so p3 is the log of the rental price and s3 is the household expenditure on rental shelter. Consequently, lots of hourseholds have s3=0, because they pay no rent. Pricerural and pricebigcity are the natural logs of urban and nonurban price indices for owned accommodation. Pricebigcity = 0 in Ontario in 2002, and pricerural is less than zero for them, because accommodation is cheaper outside big cities. z* are 22 demographic controls, with self-explanatory labels. yearbuip is 6 decades, with 6 being the most recent. typdwelp is categorical, single-detached, condo etc. hhinctot is the total income of the household. hhszd31p is number of people in the household at December 31. numbedrp is number of bedrooms. numbthrp is number of bathrooms. rpagegrp is age of household head minus 40. rpmarp is marital status of respondent. Note that total nonshelter consumption is the sum of 9 of the 10 included spending categories (s*), and does not include s3 (rental shelter expenditures). redurent is an indicator of reduced rent and of rental vs nonrental tenure. weight is the sample weight.

Your estimation and testing exercise is to:

1. Estimate a 5 good demand system (with 4 equations) using the EASI demand system, some demographics z, and price and budget variation using the GMM command in Stata. Write the full code yourself; do not use code from the web (though you may use this to help). Hand in the code with your assignment.

2. How important is unobserved preference heterogeneity compared to observed preference heterogeneity? Use graphs or tables to show this to the reader.

3. Is Slutsky symmetry true? Formally test the hypothesis of symmetry.

Lecture Notes on Demand Estimation.

Lecture Notes on OLS approaches to time-series econometrics.

Lecture Notes on Confidence Intervals and Testing.

Sources for Testing: Greene, 5th Ed, Chapters 5,6,4; Kennedy, 5th Ed, Chapter 4.

Sources for SUR: Greene, Chapter 14; Kennedy, 5th Ed, Chapter 10.

Sources for Selection Correction (Heckman Two-Step): Green Chapter 19.5 "Sample Selection".

A bit of Stata code to show how sampling distributions work.

Maximum Likelihood Notes are now found in Lecture Notes to Introduce Ordinary Least Squares (end of the doc, where ML is introdu ced) and Lecture Notes on Endogeneity (end of the doc, with application to endogeneity and selection correction), both above.

Assignment 5a, due in-class 11 April

Read Norris and Pendakur 2015 Imputing Rent in Consumption Measures. What problem are the authors trying to solve? Is their solution satisfying? How does endogeneity fit in? Are their instruments satisfying? Why do they use a probit rather than OLS? Is their limited dependent model satisfying?

Assignment 5b, due in-class, at the final exam, 21 April, 10am

Use the same data as Assignment 4b. This time, estimate some models where tenure status (rental vs owned) is the dependent variable. Choose some regressors that you think are important to the preferences and constraints involved in the rent/own decision. Estimate: 1) by OLS; 2) by probit; 3) by logit. How do these estimates differ in their magnitudes and in their interpretation?

The Final Exam, 21 April 10am-1pm

The final exam is cumulative and you should use the midterm study questions above, and the final exam study questions posted here. There will be 8 questions on the final exam (same length as the midterm). 1 will be from the midterm. 1 or 2 will be from the midterm study questions. 2 will be from the final study questions. 2 will be on stata coding used in your assignments. The rest will be similar in spirit to the preceding.

Here is the final exam from 3 years ago (last time I taught this), and here is a set of suggested responses.