Linguistics' Dr. Maite Taboada harnessing Big Data to combat fake news

April 13, 2018

From: KEY, SFU's Big Data Initiative

Located on top of Burnaby Mountain inside the Discourse Processing Lab, Maite Taboada, professor of Linguistics at Simon Fraser University (SFU), is harnessing the power of big data to make social media and online discussion platforms a better and more reliable place for communication. Her research is at the intersection of linguistics, computational linguistics and data science.

Contrary to the common notion that big data is exclusive to the science domains, Taboada is leveraging big data in an unexpected one: linguistics. Researchers across all disciplines are starting to understand how applying big data approaches can advance their research. KEY, SFU’s Big Data Initiative, empowers these researchers to unlock the potential of big data by offering powerful infrastructure, hands-on training, and expertise to deliver new research breakthroughs and innovations.

Traditionally, Natural Language Processing is great at classifying text, assigning it to pre-defined categories and semantic analysis, like text summarization. But Taboada is taking an innovative big data approach with Natural Language Processing, creating breakthroughs in identifying fake news and toxic comments — solutions desperately needed now more than ever. By using machine learning and deep learning neural networks, she can create programs that not only understand and classify words, but also exploit contextual information that helps machines better understand the nuances of language.

Taboada and SFU postdoctoral fellow Fatemeh Torabi Asr believe that there is a language of fake news – a language for wrapping false information around facts. They found that fake news is shared more often than real news, making their research vital in stemming the spread of misinformation. Outside of social media, this approach is often used in spam detection, product review analysis, coding medical patient records and a variety of other problems dealing with data in online platforms.

“Most of the big data revolution in social media analysis has examined words in isolation, a ‘bag-of-words’ approach,” Taboada explains. “We believe it is possible to investigate big data—and social media data in general—by exploiting contextual information. This is important when detecting whether a comment is sarcastic—and therefore toxic—or harmless.”