Fall 2019  STAT 310 D100
Introduction to Data Science for the Social Sciences (2)
Class Number: 10314
Delivery Method: In Person
Overview

Course Times + Location:
Mo 10:30 AM – 12:20 PM
SECB 1010, Burnaby 
Exam Times + Location:
Dec 9, 2019
12:00 PM – 3:00 PM
AQ 5005, Burnaby

Instructor:
Aaron Danielson
adaniels@sfu.ca

Prerequisites:
60 units in subjects outside of the Faculties of Science and Applied Science and one of STAT 201, STAT 203, STAT 205, STAT 270, BUEC 232, or POL 201. Corequisite: STAT 311.
Description
CALENDAR DESCRIPTION:
An introduction to modern tools and methods for data acquisition, management, visualization, and machine learning, capable of scaling to Big Data. No prior computer programming experience required. Examples will draw from the social sciences. This course may not be used to satisfy the upper division requirements of the Statistics honours, major, or minor programs. Students who have taken STAT 240, STAT 440, or any 200level or higher CMPT course first may not then take this course for further credit. Quantitative.
COURSE DETAILS:
STAT 310 is a conceptoriented course. Materials will be taught using the minimum amount of mathematical formalism necessary.
Outline:
1. Review and extension of methods from STAT 203
Labs: sampling from distributions, creating plots, computing correlations and Kendall’s tau, bootstrapping for uncertainty and conducting permutation tests.
2. Introduction to Data Science
B. Basic Supervised Learning for continuous and categorical data: Ordinary least squares, Logistic regression, Poisson regression and Multinomial regression.
C. Machine Learning Methods for Supervised Learning: Nearest Neighbors, Decision Trees, Random Forests and Gaussian Processes.
D. Unsupervised Learning: Measures of similarity and clustering.
Labs: Answering questions with data, practice with regression methods, nearest neighbors, computing a random forest and clustering.
3. Model Evaluation and Validation
B. Crossvalidation and Overfitting.
C. Measures of Prediction Error.
Labs: Evaluating statistical models, Kfold cross validation and ROC curves.
4. Application: Networks
B. Measures on Networks.
C. Community Detection.
D. Random Graph Models.
Labs: Analyzing a network data set, creating network plots, designing your own random graph model.
5. Application: Text Analysis
B. Sentiment Analysis.
C. Tools from Natural Language Processing.
Labs: Acquiring text data, perform sentiment analysis, word2vec.
6. Application: Spatial Data
B. Some Techniques for Spatial Data Analysis.
Labs: Acquiring and managing spatial data, software for spatial data.
About the Instructor:
Dr. Danielson holds degrees in Philosophy (BA Northwestern, AM University of Chicago), Policy Studies (MPP University of Chicago), Economics (MA New York University) and Statistics (Ph.D. UCLA). Most of his research focuses on the development of statistical methodology for network data structures in the social and biological sciences.
Grading
 Assignments 10%
 Lab Write Ups 15%
 Midterm 25%
 Presentation 25%
 Final Exam 25%
NOTES:
Students are required to complete assignments graded for completion. The lab component is treated much like labs in the physical and biological sciences. Students are expected to follow instructions and submit a writeup of the work done in the lab. One multiple choice midterm assesses the methodological topics taught in the first half of the course. Before the final exam, students complete and present a project to the class on an applied topic of their choice. A mixedresponse final exam tests on all material introduced in the course.
All above grading is subject to change.
Materials
REQUIRED READING:
Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text
Authors: Taylor Arnold and Lauren Defanti Tilton. Publisher: Springer
ISBN: 9783319207018
Book is available online for free through the SFU Library
RECOMMENDED READING:
Text Mining with R: A Tidy Approach
Authors: Julia Silge and David Robinson. Publisher: Beijing, China : O'Reilly
ISBN: 1491981628
Book is available online for free through the SFU Library
Text Analysis with R for Students of Literature
Author: Matthew L. Jockers Publisher: Springer
ISBN: 9783319031637
Book is available online for free through the SFU Library
Department Undergraduate Notes:
Students with Disabilites:
Students requiring accommodations as a result of disability must contact the Centre for Accessible Learning 7787823112 or csdo@sfu.ca
Tutor Requests:
Students looking for a Tutor should visit http://www.stat.sfu.ca/teaching/needatutor.html. We accept no responsibility for the consequences of any actions taken related to tutors.
Registrar Notes:
SFU’s Academic Integrity web site http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.
Each student is responsible for his or her conduct as it affects the University community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the University. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the University. http://www.sfu.ca/policies/gazette/student/s1001.html
ACADEMIC INTEGRITY: YOUR WORK, YOUR SUCCESS