Fall 2024 - CMPT 732 G100
Big Data Lab I (6)
Class Number: 6395
Delivery Method: In Person
Overview
-
Course Times + Location:
Sep 4 – Dec 3, 2024: Mon, 10:30 a.m.–12:20 p.m.
Burnaby
-
Instructor:
Gregory Baker
ggbaker@sfu.ca
1 778 782-5755
-
Instructor:
Oliver Schulte
oschulte@sfu.ca
1 778 782-3390
-
Prerequisites:
This course is only available to students enrolled in the master of science in big data program.
Description
CALENDAR DESCRIPTION:
The first of two lab courses that are part of the master of science in big data. This lab course aims to provide students with experience needed for a successful career in big data in the information technology industry. Students will earn core concepts of artificial intelligence and data engineering to work with large, or otherwise complex, data sources. Specifically, this includes statistics and data visualization, data pipeline engineering, and modelling. Many of the assignments will be completed on publicly available, massive data sets giving students hands-on experience with cloud computing, streaming data, and scalable computation - algorithms and software tools needed to master programming for big data.
COURSE DETAILS:
Many companies today collect massive amounts of data that cannot be managed without proper programming techniques. This lab courses focuses on the practical aspects of dealing with such data. It will provide insight on MapReduce, Spark, NoSQL databases, cloud computing, and data analytics for large data sets.
The objective of this class is to ensure that students will be able to:
- Use a distributed file system such as (or similar to) HDFS (Hadoop Distributed File System).
- Write software that can interact with a distributed file system using programming tools that are part of Apache Hadoop.
- Write simple distributed software using common tools.
- Be able to formulate and implement queries on large data sets.
- Write software that can interact with at least one non-relational database.
You should have access to a computer powerful enough to run a virtual machine: at least 8 GB memory, 20 GB disk, and a reasonably decent processor. Computers are also available in our computer lab.
Topics
- Big data storage and analysis
- Hadoop ecosystem
- MapReduce
- NoSQL database (HBase, Cassandra)
- Cloud computing
- Data analytics
- Spark
- Data Ingestion (Kafka)
Grading
NOTES:
To be discussed in the first week of class. Will include regular lab assignments, and a final project.
Materials
REQUIRED READING NOTES:
Your personalized Course Material list, including digital and physical textbooks, are available through the SFU Bookstore website by simply entering your Computing ID at: shop.sfu.ca/course-materials/my-personalized-course-materials.
Graduate Studies Notes:
Important dates and deadlines for graduate students are found here: http://www.sfu.ca/dean-gradstudies/current/important_dates/guidelines.html. The deadline to drop a course with a 100% refund is the end of week 2. The deadline to drop with no notation on your transcript is the end of week 3.
Registrar Notes:
ACADEMIC INTEGRITY: YOUR WORK, YOUR SUCCESS
SFU’s Academic Integrity website http://www.sfu.ca/students/academicintegrity.html is filled with information on what is meant by academic dishonesty, where you can find resources to help with your studies and the consequences of cheating. Check out the site for more information and videos that help explain the issues in plain English.
Each student is responsible for his or her conduct as it affects the university community. Academic dishonesty, in whatever form, is ultimately destructive of the values of the university. Furthermore, it is unfair and discouraging to the majority of students who pursue their studies honestly. Scholarly integrity is required of all members of the university. http://www.sfu.ca/policies/gazette/student/s10-01.html
RELIGIOUS ACCOMMODATION
Students with a faith background who may need accommodations during the term are encouraged to assess their needs as soon as possible and review the Multifaith religious accommodations website. The page outlines ways they begin working toward an accommodation and ensure solutions can be reached in a timely fashion.