Fall 2021 - STAT 310 D100

Introduction to Data Science for the Social Sciences (2)

Class Number: 5118

Delivery Method: In Person

Overview

  • Course Times + Location:

    Sep 8 – Dec 7, 2021: Mon, 10:30 a.m.–12:20 p.m.
    Burnaby

  • Exam Times + Location:

    Dec 16, 2021
    Thu, 7:00–10:00 p.m.
    Burnaby

  • Prerequisites:

    60 units in subjects outside of the Faculties of Science and Applied Science and one of STAT 201, STAT 203, STAT 205, STAT 270, BUS 232, or POL 201, with a minimum grade of C-. Corequisite: STAT 311.

Description

CALENDAR DESCRIPTION:

An introduction to modern tools and methods for data acquisition, management, visualization, and machine learning, capable of scaling to Big Data. No prior computer programming experience required. Examples will draw from the social sciences. This course may not be used to satisfy the upper division requirements of the Statistics honours, major, or minor programs. Students who have taken STAT 240, STAT 440, or any 200-level or higher CMPT course first may not then take this course for further credit. Quantitative.

COURSE DETAILS:

STAT 310 is a concept-oriented course.  Materials will be taught using the minimum amount of mathematical formalism necessary.

Outline:
1. Review and extension of methods from STAT 203
       A. Probability.
       B. Some Common Distributions.
       C. Measure of dispersion and central Tendency
       D. Statistics without Formulas:  Permutation Tests and Bootstrap Methods.


      Labs:  sampling from distributions, creating plots, computing correlations and Kendall’s tau, bootstrapping for uncertainty and conducting permutation tests.

2. Data cleaning and wrangling

A. Difference between data cleaning and wrangling
B. Data cleaning techniques
C. Data Wrangling techniques

       Labs: detect, locate and correct error in data, extract specific observations and variables from data, generate new variable and summarize data.

3. Introduction to Data Science
     
       A. Common Tasks: Prediction, explanation, and exploration.
       B. Basic supervised learning for continuous and categorical data: Ordinary least squares, Logistic regression, Poisson regression and multinomial regression.
       C. Machine learning methods for supervised learning: nearest neighbors, Decision Trees, Random Forests and Gaussian processes.
       D. Unsupervised learning: measures of similarity and clustering

       Labs: Answering questions with data, practice with regression methods, nearest neighbors, computing a random forest and clustering.


4. Model Evaluation and Validation
A. Fitting the Data:  Residuals and Likelihood Ratios.
B. Cross-validation and Overfitting.
C. Measures of Prediction Error.

Labs:  Evaluating statistical models, K-fold cross validation and ROC curves.

5. Application:  Networks
A. Basic concepts of graphs.
B. Measures on Networks.
C. Community Detection.
D. Random Graph Models.

Labs:  Analyzing a network data set, creating network plots, designing your own random graph model.

6. Application:  Text Analysis
A. Words as Data.
B. Sentiment Analysis.
C. Tools from Natural Language Processing.

Labs:  Acquiring text data, perform sentiment analysis, word2vec.

7. Application:  Spatial Data (Optional)
A. Coordinates as Data.
B. Some Techniques for Spatial Data Analysis.

Labs:  Acquiring and managing spatial data, software for spatial data.

About the Instructor:

Dr. Lin holds degree in computer science (BESc., China) and statistics (MSc, Ph.D., University of Toronto). Before joining SFU, she worked as a lecturer at university of Toronto, conducted postdoctoral research at university of southern California, co-founded an AI startup and worked as an assistant professor at TRU. Dr. Lin has broad research interests in likelihood inference, computational statistics, data visualization, statistical machine learning and neural network modelling for big data with a focus on acoustic data. In her spare time, she enjoys reading, traveling, swimming and scuba diving.

Grading

  • Assignments 15%
  • Lab Write Ups 15%
  • Midterm 25%
  • Course Project 15%
  • Final Exam 30%

NOTES:

Students are required to complete assignments graded for completion. The lab component is treated much like labs in the physical and biological sciences. Students are expected to follow instructions and submit a writeup of the work done in the lab. One multiple choice midterm assesses the methodological topics taught in the first half of the course. Before the final exam, students complete a project on an applied topic of their choice and submit a report for grading. A mixed-response final exam tests on all material introduced in the course.


All above grading is subject to change.

Materials

REQUIRED READING:

Humanities Data in R: Exploring Networks, Geospatial Data, Images, and Text
Authors: Taylor Arnold and Lauren Defanti Tilton. Publisher: Springer
ISBN: 9783319207018

Book is available on-line for free through the SFU Library

RECOMMENDED READING:

Text Mining with R: A Tidy Approach
Authors: Julia Silge and David Robinson. Publisher: Beijing, China : O'Reilly
ISBN: 1491981628

Book is available on-line for free through the SFU Library

Text Analysis with R for Students of Literature
Author: Matthew L. Jockers Publisher: Springer
ISBN: 978-3-319-03163-7

Book is available on-line for free through the SFU Library

Department Undergraduate Notes:

Students with Disabilities:
Students requiring accommodations as a result of disability must contact the Centre for Accessible Learning 778-782-3112 or csdo@sfu.ca


Tutor Requests:
Students looking for a tutor should visit hhttps://www.sfu.ca/stat-actsci/all-students/other-resources/tutoring.html. We accept no responsibility for the consequences of any actions taken related to tutors.

Registrar Notes:

ACADEMIC INTEGRITY: YOUR WORK, YOUR SUCCESS

SFU’s Academic Integrity web site http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating.  Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the University community.  Academic dishonesty, in whatever form, is ultimately destructive of the values of the University. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the University. http://www.sfu.ca/policies/gazette/student/s10-01.html

TEACHING AT SFU IN FALL 2021

Teaching at SFU in fall 2021 will involve primarily in-person instruction, with approximately 70 to 80 per cent of classes in person/on campus, with safety plans in place.  Whether your course will be in-person or through remote methods will be clearly identified in the schedule of classes.  You will also know at enrollment whether remote course components will be “live” (synchronous) or at your own pace (asynchronous).

Enrolling in a course acknowledges that you are able to attend in whatever format is required.  You should not enroll in a course that is in-person if you are not able to return to campus, and should be aware that remote study may entail different modes of learning, interaction with your instructor, and ways of getting feedback on your work than may be the case for in-person classes.

Students with hidden or visible disabilities who may need class or exam accommodations, including in the context of remote learning, are advised to register with the SFU Centre for Accessible Learning (caladmin@sfu.ca or 778-782-3112) as early as possible in order to prepare for the fall 2021 term.