Faculty Award

Dr. Maite Taboada receives funding for two important projects

July 07, 2022

Congratulations to Dr. Maite Taboada for receiving funding through the National Sciences and Engineering Research Council of Canada Discovery Research Programs. More than a $506 million investment in research was announced through the National Sciences and Engineering Research Council in this year's competition.

Dr. Taboada's project, Natural language processing for detecting toxic, abusive, and hateful language online, will present a timely look into the detection of online toxicity. The goal of the project is to continue work that Dr. Taboada's Discourse Processing Lab has been pursuing to develop tools to automatically identify and classify non-constructive and toxic comments.

Dr. Taboada is also part of a team that recently received a grant from Quebec funding agency L'Observatoire international sur les impacts sociétaux de l'IA et du numérique (OBVIA) for projects on societal impacts. The project, Mind the Gap: representation des femmes dans les media Québécois durant la pandémie de COVID-19 (Representation of women in Quebec media during the COVID-19 pandemic), is a collaborative project with researchers at Université Laval, Queen's University, the University of Ottawa, and Simon Fraser University that extends the Discourse Processing Lab's Gender Gap Tracker to look specifically at French media.

NSERC Discovery Grant Project Summary- Natural language processing for detecting toxic, abusive, and hateful language online

Digital technologies offer incredible power, from artificial intelligence and virtual assistants to social media and recommendation systems. Deploying such technologies in a manner beneficial to both individuals and society is a pressing challenge. In mainstream and social media, content providers welcome feedback; such feedback, however, may be 'toxic': malicious, abusive, or offensive.

Toxic comments and posts online are those that intend to cause harm. They may take the form of personal attacks, abuse, harassment, threats and may include profane, obscene, or derogatory language, with hate speech being the most extreme. In the last few years, I have closely studied online news comments and developed natural language processing (NLP) methods to analyze them. My long-term program of research develops robust methods for text classification in tasks such as sentiment analysis, misinformation detection, and content moderation. In the next few years, my SFU laboratory, the Discourse Processing Lab, will continue to study toxic language online, to develop methods and algorithms to detect toxicity automatically. Our work identifying constructive comments, those that contribute positively to an online discussion, has provided excellent insight for how to automatically classify non-constructive and toxic comments.

Current approaches to detecting online toxicity are based either on general text characteristics (word length, text length, capitalization, and punctuation) or on lists of words likely to cause offense. Machine learning approaches (supervised, semi-supervised, or based on neural networks) rely on large annotated datasets, but many studies have shown that such approaches often fail because negativity in language may be wrapped in positive words, through metaphors and other figures of speech. Research, including our own, has found that accurately identifying and filtering toxic content requires a multidisciplinary perspective, drawing on a deep understanding of linguistics and on current methods in NLP and machine learning.

To address existing gaps in the automatic detection of toxic comments, in the next five years I plan to: (Objective 1) study how metaphors and other figures of speech well known since antiquity (euphemisms, litotes, hyperbole, sarcasm) convey toxic language. I will then develop (Objective 2) a system to detect figures of speech automatically, while I integrate into (Objective 3) a new content moderation platform.

The results of this work will mobilize research among scholars interested in evaluative language and the role of media in public discourse, including linguists, computational linguists, and communication and media researchers. At a time when media organizations, social media platforms, and the public are concerned about online abuse, misinformation, and the role of digital technology in politics and society, this project is timely and will make an important contribution to public discourse.

OBVIA Project Summary (English follows French) - Mind the Gap: representation des femmes dans les medias Québécois durant la pandemie de COVID-19

Ce projet vise à analyser les représentations médiatiques des femmes dans les médias québécois durant la pandémie de COVID-19 et ce, dans une perspective d’étude de genre incluant ses dimensions intersectionnelles. Cette recherche s’ancre dans une méthodologie empruntée aux méthodes mixtes, combinant les apports technologiques de l’Intelligence artificielle aux possibilités de compréhension et d’interprétation offertes par une analyse thématique de contenu. Dans un premier temps, notre recherche s’appuie sur les outils intelligents du Gender Gap Tracker (GGT), développés par la Pre. Maite Taboada (SFU) qui quantifie en temps réel les représentations des hommes et des femmes dans les médias canadiens principalement anglophones. Nous utiliserons ces outils dans un contexte francophone afin de quantifier le ratio de représentation hommes-femmes du 1er janvier 2020 au 1er janvier 2022 dans dix grands médias québécois. Dans un second temps, nous développerons une fonctionnalité supplémentaire au GGT afin de procéder à une classification thématique des articles de presse analysés dans notre corpus. L’objectif ici est d’identifier les thèmes récurrents en lien avec les logiques de représentations médiatiques des femmes et des hommes. Pour cela, nous voulons développer une méthode d’apprentissage automatique non-supervisée inspirée du «Topic Modeling».

Dans un troisième et dernier temps, nous procéderons à une analyse thématique de contenu afin d’expliquer la signification de ces représentations médiatiques dans une perspective intersectionnelle.

Nos résultats constitueront une base empirique solide à partir de laquelle il nous sera possible d’engager une discussion avec les différentes entreprises de presse québécoise et de collaborer ensemble à la construction d’un espace médiatique plus inclusif et auto-critique sur ses propres pratiques. Ce faisant, nous souhaitons participer activement à la prise en compte des enjeux de genre de l’inclusion et de la diversité dans la production de l’information.

English

The project analyses media representations of women in the Quebec media during the COVID-19 pandemic from an intersectional perspective. This research is grounded in mixed methods approaches, combining the technological contributions of Artificial Intelligence with the possibilities of understanding and interpretation offered by a thematic content analysis. In a first step, our research is based on the intelligent tools of the Gender Gap Tracker (GGT) developed by Professor Maite Taboada (SFU) which quantifies in real time the representations of men and women in the main English-language Canadian media. We will use these tools in a French-language context to quantify the ratio of male-female representation from January 1, 2020 to January 1, 2022 in ten major Quebec media sources. In a second step, we will develop an additional functionality for the GGT to carry out a thematic classificationi of the press articles analysed in our corpus. The objective here is to identify recurring themes related to the logic of media representations of women and men. For this, we want to develop an unsupervised machine learning method inspired by topic modeling.

In the third and final step, we will carry out a thematic content analysis to explain the meaning of these media representations from an intersecional perspective.

Our results will provide a solid empirical basis from which it will be possible for us to engage in a discussion with the various Quebec press companies and to collaborate in the construction of a mroe inclusive and self-critical media space. In doing so, we want to actively participatein taking into account the gender issues of inclusion and diversity in the production of information.

Simon Fraser University
Engaging the World

Department of Linguistics

Support the Department of Linguistics

Faculty Award

Dr. Maite Taboada receives funding for two important projects

NSERC Discovery Grant Project Summary- Natural language processing for detecting toxic, abusive, and hateful language online

OBVIA Project Summary (English follows French) - Mind the Gap: representation des femmes dans les medias Québécois durant la pandemie de COVID-19

English

Learn More →