Design and Piloting Data Wrangling Projects for a New Data Science Course (STAT 240)

Grant program: Teaching and Learning Development Grant (TLDG)

Grant recipient: David Campbell, Department of Statistics and Actuarial Science

Project team: Ebrahim Adeeb, Paul Aubin, Tebuwana Ratnasekera, and Jiying Wen, research assistants

Timeframe: September 2016 to August 2017

Funding: $5,000

Course addressed: STAT 240 – Introduction to Data Science

Final report: View David Campbell's final report (PDF) and Appendix (PDF)

Description: This is a single phase Design and Pilot project with the potential for a follow up in preparation for future offerings of the course.

Data scientists use subject area knowledge to devise their own scientific hypotheses, obtain data (usually through electronic means), and produce inference. In all other statistics classes students are given clean data and asked specific questions; however, this course instead focuses on giving students the scientific skills to ask and answer their own questions. STAT 240 is the first time that students will be focusing on data acquisition, cleaning, and manipulation.  The challenges of obtaining and cleaning data are highly specific to the exact question being asked and available data resources, but the skills and strategies for tackling problems are transferrable. The goal of this project is to ensure the course is taught in a realistic project based manner, giving the students a taste of the potential power of the skills they acquire in their undergraduate degrees.

Knowledge sharing: The final project resources will be packaged in a format that can be transferred to others teaching in a similar manner. We intend to build on this course and require it as a pre-requisite for other courses. Brown bag lunch discussions and departmental curriculum meetings will be useful for sharing insights and resources.  Some aspects of the content design and development such as the data sets and project based methods may transfer beyond this course especially in teaching STAT 440 - Learning from Big Data.

Keywords: messy data, experiential learning, big data, analytics, instructor notes, classroom observation, pilot testing, Course development, Supporting students, Piloting content, Data base, Data Science