Fall 2022 - STAT 310 D100

Introduction to Data Science for the Social Sciences (2)

Class Number: 4707

Delivery Method: In Person

Overview

  • Course Times + Location:

    Sep 7 – Dec 6, 2022: Mon, 10:30 a.m.–12:20 p.m.
    Burnaby

  • Prerequisites:

    60 units in subjects outside of the Faculties of Science and Applied Sciences and one of STAT 201, STAT 203, STAT 205, STAT 270, BUS 232, ECON 233, or POL 201, with a minimum grade of C-. Corequisite: STAT 311.

Description

CALENDAR DESCRIPTION:

An introduction to modern tools and methods for data acquisition, management, visualization, and machine learning, capable of scaling to Big Data. No prior computer programming experience required. Examples will draw from the social sciences. This course may not be used to satisfy the upper division requirements of the statistics honours, major, or minor programs. Students who have taken STAT 240, STAT 440, or any 200-level or higher CMPT course first may not then take this course for further credit. Quantitative.

COURSE DETAILS:

STAT 310 is a concept-oriented course.  Materials will be taught using the minimum amount of mathematical formalism necessary.

Outline:
1. Review and extension of methods from STAT 203
       A. Probability.
       B. Some Common Distributions.
       C. Measure of dispersion and central Tendency
       D. Statistics without Formulas:  Permutation Tests and Bootstrap Methods.


      Labs:  sampling from distributions, creating plots, computing correlations and Kendall’s tau, bootstrapping for uncertainty and conducting permutation tests.

2. Introduction to Data Science
     
       A. Common Tasks: Prediction, explanation, and exploration.
       B. Basic supervised learning for continuous and categorical data: Ordinary least squares, Logistic regression, Poisson regression and multinomial regression.
       C. Machine learning methods for supervised learning: nearest neighbors, Decision Trees, Random Forests and Gaussian processes.
       D. Unsupervised learning: measures of similarity and clustering

       Labs: Answering questions with data, practice with regression methods, nearest neighbors, computing a random forest and clustering.


3. Model Evaluation and Validation
A. Fitting the Data:  Residuals and Likelihood Ratios.
B. Cross-validation and Overfitting.
C. Measures of Prediction Error.

Labs:  Evaluating statistical models, K-fold cross validation and ROC curves.

4. Application:  Networks
A. Basic concepts of graphs.
B. Measures on Networks.
C. Community Detection.
D. Random Graph Models.

Labs:  Analyzing a network data set, creating network plots, designing your own random graph model.

5. Application:  Text Analysis
A. Words as Data.
B. Sentiment Analysis.
C. Tools from Natural Language Processing.

Labs:  Acquiring text data, perform sentiment analysis, word2vec.

6. Application:  Spatial Data (Optional)
A. Coordinates as Data.
B. Some Techniques for Spatial Data Analysis.

Labs:  Acquiring and managing spatial data, software for spatial data.

About the Instructor:

Dr. Lin holds degree in computer science (BESc., China) and statistics (MSc, Ph.D., University of Toronto). Before joining SFU, she worked as a lecturer at university of Toronto, conducted postdoctoral research at university of southern California, co-founded an AI startup and worked as an assistant professor at TRU. Dr. Lin has broad research interests in likelihood inference, computational statistics, data visualization, statistical machine learning and neural network modelling for big data with a focus on acoustic data. In her spare time, she enjoys reading, traveling, swimming and scuba diving.

Grading

  • Assignments 40%
  • Midterm 25%
  • Final Exam 35%

NOTES:

Students are required to complete assignments graded for completion. The lab component is treated much like labs in the physical and biological sciences. Students are expected to follow instructions and submit a writeup of the work done in the lab. One multiple choice midterm assesses the methodological topics taught in the first half of the course. Before the final exam, students complete a project on an applied topic of their choice and submit a report for grading. A mixed-response final exam tests on all material introduced in the course.


All above grading is subject to change.

Materials

REQUIRED READING:

(ISL) An introduction to statistical learning with application in R, by Gareth James, Daniela Written, Trevor Hastie, and Robert Tibshirani. The free e-book is available: https://web.stanford.edu/~hastie/ISLRv2_website.pdf (Links to an external site.)

RECOMMENDED READING:

(IDS) An introduction to Data Science: Data Analysis and Prediction Algorithm with R, by Rafael A. Irizarry. It is free online: https://rafalab.github.io/dsbook/ (Links to an external site.)

(HML) Hands-on Machine Learning with R, by Bradley Boehmke & Brandon Greenwell. It is free online: https://bradleyboehmke.github.io/HOML/ (Links to an external site.)

(PTA) Practical Text Analytics: maximizing the value of text data, by Murugan Anandarajan, Chelsey Hill, and Thomas Nolan. Available through SFU. 

REQUIRED READING NOTES:

Your personalized Course Material list, including digital and physical textbooks, are available through the SFU Bookstore website by simply entering your Computing ID at: shop.sfu.ca/course-materials/my-personalized-course-materials.

Department Undergraduate Notes:

Students with Disabilities:
Students requiring accommodations as a result of disability must contact the Centre for Accessible Learning 778-782-3112 or caladmin@sfu.ca.  


Tutor Requests:
Students looking for a tutor should visit https://www.sfu.ca/stat-actsci/all-students/other-resources/tutoring.html. We accept no responsibility for the consequences of any actions taken related to tutors.

Registrar Notes:

ACADEMIC INTEGRITY: YOUR WORK, YOUR SUCCESS

SFU’s Academic Integrity website http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the university community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the university. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the university. http://www.sfu.ca/policies/gazette/student/s10-01.html