NOTES ON PROJECTS The project this semester involves estimating a multiple regression model, where the CANSIM number for the dependent variable will be assigned to you in tutorial. The period for which the data are claimed to be available is noted. You must determine which independent variables (generally, at least three variables are needed) to use to "explain" the dependent variable, find the data on these independent variables using CANSIM, download all the data, estimate the resultant equation, do some diagnostic testing, and write up the results. Although some of the topics assigned can be viewed as esoteric, similar problems are often given to persons working in the business or economic consulting fields. A link to the CANSIM web page is available on my web page, and after accessing that you should click on "search CANSIM". For the dependent variable (where you know the CANSIM number), you can then click on "retrieve a single cansim series by label", and once retrieved (use spreadsheet format), save it to your diskette using the "save" command in the "file" menu. To find series for which you do not have the CANSIM number, click on "search cansim catalog files". If the matrix search command does not yield what you want try the "main index" or "full text index" options. You may also want to try the "access the alphabetical list of matrices" button. Don't give up easily - there are a lot of series in CANSIM, but finding what you want is NOT (usually) easy. You should try using CANSIM early in the semester - don't wait until week 11 to find out how it works. Note that before entering data into EXCEL, all series must have the same start and end date. This can be achieved in two different ways: (1) when getting the series from CANSIM, ensure that all variables you will want are available for a common period, and use this period to set beginning and ending dates on the retrieve command, or (2) edit the data after it has been retrieved, e.g. in WORD. The summary report (5 pages maximum) should note (i) a brief theoretical or ad hoc rationale for why the independent variables selected should influence the dependent variable, including expected signs, (ii) the exact specification of the independent variables, including any adjustments made to the data, problems of missing observations or transformations made, (iii) any changes to the sample period or periodicity of the data (e.g. monthly to quarterly) due to problems of obtaining desired data, (iv) assessment of regression diagnostics, including looking at the residuals to determine outliers and testing for autocorrelation (section 14.8), and (v) an interpretation of the results. Computer output, with the relevant results highlighted, must be turned in with the report for grading. I strongly suggest people check with their TA once they have determined which independent variables they think they should try to find, to make sure they have not omitted something important. I suggest you use the maximum period for which data are available for all variables. You will often find that the independent variables you have chosen will not be available for the full period for which the dependent variable is available. It is permissible to shorten the observation period as long as at least 20 observations are used for annual data, 30 observations for quarterly data, or 50 observations for monthly data. Do not shorten the data period more than is necessary, as a loss of power (in the statistical sense) will result, and the amount of work you have to do is not reduced by shortening the period in any event. If it appears this sample size "rule" must be violated, please check with your TA before proceeding. I suggest using the data with the periodicity given for the dependent variable, if necessary changing the periodicity of the independent variables to match (e.g. you can "make" monthly data out of quarterly data by repeating each observation three times). On the other hand, if you have a lot of observations of monthly data on your dependent variable, and some independent variable is only available in quarterly form, it may be desirable to change all variables to a quarterly basis. This can be done on the retrieve command in CANSIM. If in doubt, check with your TA. Finally, it should be noted that I did not actually download most of the data or check that there was as much data available as the header indicated. You may find that the data contain less observations than indicated, or that more recent observations have become available since I looked (CANSIM is updated weekly). Further, I did note in passing that some series contain missing data points. I suggest you fill these in by interpolation, noting on your writeup that you did so.