President's Dream Colloquium on Engaging Big Data

Speaker: Surajit Chaudhuri

Big Data since 1854

Dr. Surajit Chaudhuri, Scientist and Managing Director of XCG (Microsoft Research) 
Stanford University, PhD
Tuesday, January 19, 2016, 3:30–5 pm
IRMACS Theatre, ASB 10900, Burnaby campus


Talk highlights

Opening and Introduction
0:08 - Opening remarks by Peter Chow-White
2:39 - Acknowledgement and Thanks by Fred Popowich, sponsors: Microsoft and Simba Technologies
4:08 - Introducing the speaker Surajit Chaudhuri

Dr. Surajit Chaudhuri's Talk
7:04 - Introduction
8:27 - The story of John Snow: Big Data in 1854
10:43 - Journey from data to insight: Four steps of Data Analysis
12:46 - Overview of the outline of the talk
13:46 - The problems in data analysis

16:36 - Problems in Data Privacy
23:09 - Data collection suffers from biases
26:06 - Ensuring fairness and its challenges

28:36 - Trends big data: Changing the Landscape of Data Analysis

29:43 - New application areas
31:09 - New sources of data
33:24 - Fast responsiveness
35:13 - Digital data: volume, velocity and structure

36:33 - Examples of big data applications
46:12 - Platforms, infrastructure and tools for big data: an overview
51:25 - Open Challenges in Big Data

51:40 - How to reduce the time to insights?
56:18 - Leveraging text mining
58:54 - Democratization: challenges and an example

59:56 - End of Dr. Chaudhuri's lecture
1:02:18 - Questions and Answers Session
1:27:02 - Closing remarks by Fred Popowich


Surajit Chaudhuri is a Distinguished Scientist at Microsoft Research and leads the Data Management, Exploration and Mining group. As a Deputy Managing Director of MSR Redmond Lab, he also has oversight of Distributed Systems, Networking, Security, Programming languages and Software Engineering group.

He serves on the Senior Leadership Team of the Executive Vice President of Microsoft's Cloud and Enterprises division. His current areas of interest are enterprise data analytics, data discovery, self-manageability and cloud database services. Working with his colleagues in Microsoft Research, he helped incorporate the Index Tuning Wizard (and subsequently Database Engine Tuning Advisor) and data cleaning technology into Microsoft SQL Server.

Surajit is an ACM Fellow, a recipient of the ACM SIGMOD Edgar F. Codd Innovations Award, ACM SIGMOD Contributions Award, a VLDB 10 year Best Paper Award, and an IEEE Data Engineering Influential Paper Award. Surajit received his Ph.D. from Stanford University in 1992.


In 1854, John Snow identified the cause of a cholera outbreak using analysis of "Big Data."

We will reflect on what has and what has not changed since then. For example, technological innovations have given us an amazing ability to gather and analyze data. This in turn has led to broad interest in driving decision making by leveraging insights gleaned from data through machine learning and statistical techniques. We will look at novel examples of leveraging big data from different domains.

Despite its promise, broad-based use of Big Data techniques has several difficult challenges. For example, appropriate selection and fusion of data remain difficult. We will discuss some of such open challenges that need to be solved to accelerate the process of identifying insights from Big Data.

Research Questions Examined:

  • What are the key steps in going from data to insights?
  • What makes data analysis hard?
  • What are the challenges in finding relevant data for a problem?
  • What are the untapped opportunities in leveraging Big Data?
  • What are the implications of Big Data for data processing platforms and tools?