Spring 2020 - CMPT 756 G100

Systems For Big Data (3)

Class Number: 6771

Delivery Method: In Person


  • Course Times + Location:

    Jan 6 – Apr 9, 2020: Mon, 10:30 a.m.–12:20 p.m.

    Jan 6 – Apr 9, 2020: Wed, 10:30–11:20 a.m.

  • Prerequisites:

    Operating Systems (CMPT 300) and Data Base Systems (CMPT 354), or equivalents.



From health care to social media the world generates a tremendous amount of data every day, often too much to be processed on a single computer or even some-times a single data centre. In this graduate seminar we will learn about technologies and systems behind Big Data. In particular, we will discuss what challenges exist in processing and storing massive amounts of data. We will explore how these challenges are being solved in real-world systems as well as the limitations inherent in these designs. The evolution of these technologies will be explored by reading both current and historically significant research papers. Students with credit for CMPT 886 when offered as a Special Topics course in Big Data may not take this course for further credit.


Data engineering turns a proof-of-concept data model into an efficient, maintainable service well-matched to running in large-scale data centres. Such services are resilient against failure of either their own instances or a service dependency, deployed in a controlled way, suited for co-tenancy with services with different latency goals, instrumented to detect performance problems, and compatible with the data centre scheduler. Data engineering ensures that the services run all night while the operations staff sleep all night.


  • Design space analysis: Analyze the tradeoffs inherent in a cloud service design.
  • Service level metrics, indicators, objectives, and agreements: Describe their distinct uses.
  • Resilience and recovery: Estimate, test, and improve the resilience of a design to failure.
  • Latency versus consistency: Analyze tradeoffs between connected and partitioned performance.
  • Data centre operating systems: Differentiate approaches to managing the resources of a data centre.
  • Instrumentation: Classify the approaches to instrumenting a system and add instrumentation.
  • Consensus: Summarize the need for consensus, particularly for service metadata and their algorithms.



All assignments will be projects; there will be no tests or final exam. Projects will be a mix of individual and group efforts. The final project will be done in groups.


Students must attain an overall passing grade on the weighted average of exams in the course in order to obtain a clear pass (C- or better).

Graduate Studies Notes:

Important dates and deadlines for graduate students are found here: http://www.sfu.ca/dean-gradstudies/current/important_dates/guidelines.html. The deadline to drop a course with a 100% refund is the end of week 2. The deadline to drop with no notation on your transcript is the end of week 3.

Registrar Notes:

SFU’s Academic Integrity web site http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating.  Check out the site for more information and videos that help explain the issues in plain English.

Each student is responsible for his or her conduct as it affects the University community.  Academic dishonesty, in whatever form, is ultimately destructive of the values of the University. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the University. http://www.sfu.ca/policies/gazette/student/s10-01.html