Fall 2024 - STAT 310 D100

Introduction to Data Science for the Social Sciences (4)

Delivery Method: In Person


  • Course Times + Location:

    Sep 4 – Dec 3, 2024: Mon, 10:30 a.m.–12:20 p.m.

    Oct 15, 2024: Tue, 10:30 a.m.–12:20 p.m.

  • Prerequisites:

    60 units in subjects outside of the Faculties of Science and Applied Sciences and one of STAT 201, STAT 203, STAT 205, STAT 270, BUS 232, ECON 233, or POL 201, with a minimum grade of C-.



An introduction to modern tools and methods for data acquisition, management, visualization, and machine learning, capable of scaling to Big Data. No prior computer programming experience required. Examples will draw from the social sciences. This course may not be used to satisfy the upper division requirements of the statistics honours, major, or minor programs. Students who have taken STAT 240, STAT 440, or any 200-level or higher CMPT course first may not then take this course for further credit. Quantitative.


STAT 310 is a concept-oriented course.  Materials will be taught using the minimum amount of mathematical formalism necessary.

1. Review and extension of methods from STAT 203
       A. Probability.
       B. Some Common Distributions.
       C. Measure of dispersion and central Tendency
       D. Statistics without Formulas:  Permutation Tests and Bootstrap Methods.

      Labs:  sampling from distributions, creating plots, computing correlations and Kendall’s tau, bootstrapping for uncertainty and conducting permutation tests.

2. Introduction to Data Science
       A. Common Tasks: Prediction, explanation, and exploration.
       B. Basic supervised learning for continuous and categorical data: Ordinary least squares, Logistic regression, Poisson regression and multinomial regression.
       C. Machine learning methods for supervised learning: nearest neighbors, Decision Trees, Random Forests and Gaussian processes.
       D. Unsupervised learning: measures of similarity and clustering

       Labs: Answering questions with data, practice with regression methods, nearest neighbors, computing a random forest and clustering.

3. Model Evaluation and Validation
A. Fitting the Data:  Residuals and Likelihood Ratios.
B. Cross-validation and Overfitting.
C. Measures of Prediction Error.

Labs:  Evaluating statistical models, K-fold cross validation and ROC curves.

4. Application:  Networks
A. Basic concepts of graphs.
B. Measures on Networks.
C. Community Detection.
D. Random Graph Models.

Labs:  Analyzing a network data set, creating network plots, designing your own random graph model.

5. Application:  Text Analysis
A. Words as Data.
B. Sentiment Analysis.
C. Tools from Natural Language Processing.

Labs:  Acquiring text data, perform sentiment analysis, word2vec.

6. Application:  Spatial Data (Optional)
A. Coordinates as Data.
B. Some Techniques for Spatial Data Analysis.

Labs:  Acquiring and managing spatial data, software for spatial data.


  • Assignments 40%
  • Midterm 25%
  • Final Exam 35%


Students are required to complete assignments graded for completion. The lab component is treated much like labs in the physical and biological sciences. Students are expected to follow instructions and submit a writeup of the work done in the lab. One multiple choice midterm assesses the methodological topics taught in the first half of the course. Before the final exam, students complete a project on an applied topic of their choice and submit a report for grading. A mixed-response final exam tests on all material introduced in the course.

All above grading is subject to change.



(ISL) An introduction to statistical learning with application in R, by Gareth James, Daniela Written, Trevor Hastie, and Robert Tibshirani. The free e-book is available: https://web.stanford.edu/~hastie/ISLRv2_website.pdf (Links to an external site.)


(IDS) An introduction to Data Science: Data Analysis and Prediction Algorithm with R, by Rafael A. Irizarry. It is free online: https://rafalab.github.io/dsbook/ (Links to an external site.)

(HML) Hands-on Machine Learning with R, by Bradley Boehmke & Brandon Greenwell. It is free online: https://bradleyboehmke.github.io/HOML/ (Links to an external site.)

(PTA) Practical Text Analytics: maximizing the value of text data, by Murugan Anandarajan, Chelsey Hill, and Thomas Nolan. Available through SFU. 


Department Undergraduate Notes:

Students with Disabilities:
Students requiring accommodations as a result of disability must contact the Centre for Accessible Learning 778-782-3112 or caladmin@sfu.ca.  

Tutor Requests:
Students looking for a tutor should visit https://www.sfu.ca/stat-actsci/all-students/other-resources/tutoring.html. We accept no responsibility for the consequences of any actions taken related to tutors.

SFU’s Academic Integrity website http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the university community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the university. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the university. http://www.sfu.ca/policies/gazette/student/s10-01.html


Students with a faith background who may need accommodations during the term are encouraged to assess their needs as soon as possible and review the Multifaith religious accommodations website. The page outlines ways they begin working toward an accommodation and ensure solutions can be reached in a timely fashion.