Fall 2020 - CMPT 732 G100

Programming for Big Data 1 (6)

Class Number: 6667

Delivery Method: In Person


  • Course Times + Location:

    Mo 1:30 PM – 3:20 PM



This course is one of two lab courses that are part of the Professional Master’s Program in Big Data in the School of Computing Science. This lab course aims to provide students with the hands-on experience needed for a successful career in Big Data in the information technology industry. Many of the assignments will be completed on massive publically available data sets giving them appropriate experience with cloud computing and the algorithms and software tools needed to master programming for Big Data. Over 13 weeks of lab work and 12 hours per week of lab time, the students will obtain a solid background in programming for Big Data.


Many companies today collect massive amounts of data that cannot be managed without proper programming techniques. This lab courses focuses on the practical aspects of dealing with such data. It will provide insight on MapReduce, Spark, NoSQL databases, cloud computing, and data analytics for large data sets. Instructor's Objectives: ------------------------ The objective of this class is to ensure that students will be able to: - Use a distributed file system such as (or similar to) HDFS (Hadoop Distributed File System). - Write software that can interact with a distributed file system using programming tools that are part of Apache Hadoop. - Write simple distributed software using common tools. - Be able to formulate and implement queries on large data sets. - Write software that can interact with at least one non-relational database. Online offering notes: you will need a computer with a webcam and reliable Internet access. The computer should be powerful enough to run a virtual machine: at least 8 GB memory, 20 GB disk, and a reasonably decent processor.


  • Big data storage and analysis
  • Hadoop ecosystem
  • MapReduce
  • NoSQL database (HBase, Cassandra)
  • Cloud computing
  • Data analytics
  • Spark
  • Data Ingestion (Kafka)


  • To be discussed in the first week of class. Will include regular lab assignments, and a final project.

Graduate Studies Notes:

Important dates and deadlines for graduate students are found here: http://www.sfu.ca/dean-gradstudies/current/important_dates/guidelines.html. The deadline to drop a course with a 100% refund is the end of week 2. The deadline to drop with no notation on your transcript is the end of week 3.

Registrar Notes:


SFU’s Academic Integrity web site http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating.  Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the University community.  Academic dishonesty, in whatever form, is ultimately destructive of the values of the University. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the University. http://www.sfu.ca/policies/gazette/student/s10-01.html


Teaching at SFU in fall 2020 will be conducted primarily through remote methods. There will be in-person course components in a few exceptional cases where this is fundamental to the educational goals of the course. Such course components will be clearly identified at registration, as will course components that will be “live” (synchronous) vs. at your own pace (asynchronous). Enrollment acknowledges that remote study may entail different modes of learning, interaction with your instructor, and ways of getting feedback on your work than may be the case for in-person classes. To ensure you can access all course materials, we recommend you have access to a computer with a microphone and camera, and the internet. In some cases your instructor may use Zoom or other means requiring a camera and microphone to invigilate exams. If proctoring software will be used, this will be confirmed in the first week of class.

Students with hidden or visible disabilities who believe they may need class or exam accommodations, including in the current context of remote learning, are encouraged to register with the SFU Centre for Accessible Learning (caladmin@sfu.ca or 778-782-3112).