Jian Pei, Computing Science Professor
Think big data, and you might imagine exploding volumes of digital information as vast, faceless statistics. But for SFU computing science professor Jian Pei, people are at the heart of big data.
“We do not treat data or the software program as the core of our work: everything is connected to people,” says Pei. “Our research is based on helping people and developing solutions to improve their lives – for example, tools and methods for fraud detection.”
According to Microsoft Academic Search, Pei is one of the top 10 data mining authors in the world. He has been developing methods for working with huge volumes of digital information since 2008, years before the term big data was even coined.
“At first, we worked on query suggestions,” he says. “We looked at the questions people ask search engines, along with the context in which the questions were asked; then we tried to develop the answer that best fits the context."
The importance of context extends to his recent work in big data analysis. Fraud detection is a key application of data mining techniques, critical in many industries like finance, insurance and retail. For example, a fraudulent credit card transaction creates an outlier record with a non-typical value, which flags it as unusual activity.
Pei explains: “For example, a credit card company may call you if you fill your car at a petrol station at 2 a.m. – this behavior is often considered an outlier without contextual information. But if we develop a program that looks at your transaction history, we can see that it’s Saturday and maybe you were at at a resturant or bar 1:00 a.m. We can look at your trajectory and see the petrol station is on your way home.” Other contextual factors like location, age group and occupation can help accurately flag outlier activity.
Pei is cognizant of the confidentiality and transparency challenges raised by the big data revolution. In 2008, his team was one of the first to suggest anonymization – encrypting or removing personally identifiable information from data sets – in social networks.
“It’s important that people are aware that they’re being captured by big data,” he says. “We looked at how data can be changed in a minimal way so it can’t be traced to a specific person.” The team’s research stimulated further discussion about ethics and big data, and influenced the subsequent publication of hundreds of papers on the same topic.
Pei teaches CMPT 456: Information Retrieval and Web Search, a popular undergraduate class filled with students keen to explore hot topics like web analytics, social network search engines and sponsored searches.
As a teacher, he advocates learning from experience. In their first class project, CMPT 456 students build a search engine from scratch using open source components.
“We start the course with what could traditionally be considered a final project,” says Pei. “Right from the beginning, students experience the topic first-hand and understand the challenges involved. They have the opportunity to build first, and then analyze what they’ve created.”
Pei hopes to instill in all his students the spirit of innovation. “When the time comes to look at how people solve these problems successfully, the students not only have a better understanding, but they have new ideas – novel ideas,” he says.
“I want my students to think outside the box and not be constrained by rules.”
Professor Jian Pei leads the newly launched Pacific Blue Cross big-data lab at SFU.