April 05, 2022

Linguistics Research Spotlight: Dr. John Alderete

Discovering Cross-Linguistic Trends in Speech Errors

Professor John Alderete, Director of the Language Production Lab, engages in theoretical analysis of language, linguistic fieldwork, psycholinguistic experimentation, and computational modeling of language to examine how complex linguistic systems are learned and used in speech. His current focus investigates how phonological structures of particular languages shape speech production processes. To support this research, Dr. Alderete was awarded a Social Sciences and Humanities Research Council (SSHRC) Insight Grant in 2020 for his project, Speech Production and Speech Errors: New Directions Using Corpus and Experimental Methods in English and Cantonese. We recently spoke with him to learn more.

“People use speech errors, or slips of tongue, as a way of probing and understanding the underlying production processes behind speaking,” Dr. Alderete explained. “The mental act of speaking is pretty rich and involves more than one simple mapping. At different levels of analysis, speech errors reveal information about those different levels, such as word finding (finding the appropriate word for the concept you want to speak) and speech planning (retrieving the consonants and vowels for that word that you’ve retrieved). Speech errors give us very natural data for understanding how those processes can go wrong, and when they do go wrong, what that tells us about the nature of our capacity for speaking.”

As a tremendously complex behaviour, speaking involves assigning concepts to words (lexical selection), retrieving the sounds inside words (phonological encoding), and expressing words with explicit motor actions (articulation). Despite wide acceptance of these distinct production processes, there is debate on what kinds of information these processes have access to and how they might interact through this information in speaking. Dr. Alderete argues that the study of lexical properties, such as a word’s frequency or confusability with other words, has significant promise for advancing the debate over interactivity because lexical properties reveal how word-level information is accessed by these processes, and how interactions across the three production processes occur.

Past research on the sound structure of speech errors has largely come from a narrow set of majority languages, small datasets, and on contradictory results. “The gaps and insufficiencies in the research have been well documented, and I’ve known about them for a long time, but it’s only in the past few years that I’ve been able to address the problem,” said Alderete. A 2011 survey found that roughly 85% of all areas of psycholinguistic research was conducted on just 10 majority languages (Anand et al., 2011). Alderete found similar results: 84% of all speech error studies came from these same major Indo-European languages.  

Together with collaborator Dr. Melissa Baese-Berk (University of Oregon) and students in their respective labs, Dr. Alderete is addressing the lack of linguistic diversity in speech error research by investigating large datasets of speech errors from two typologically distinct languages, English and Cantonese. “Cantonese is both an understudied language and has great importance in the Lower Mainland,” Alderete explained. “In choosing Cantonese we were appealing to local interest and talent.”

Dr. Alderete’s recently published article “Cross-Linguistic Trends in Speech Errors: An Analysis of Sub-Lexical Errors in Cantonese” outlines the Cantonese portion of the study in detail, which examined 2,245 sub-lexical speech errors from Cantonese. Since Cantonese syllable structure, consonant and vowel inventories, and supra segmentals are very different from major Indo-European languages, reviewing the data against the psycholinguistic effects commonly used as a lens on word-form encoding represents a fresh test of their validity.

“Perhaps the biggest contribution of the present study is the confirmation of most of these psycholinguistic effects,” Alderete wrote. “Many of these effects have comparable magnitudes to the same effects in Indo-European languages: the single phoneme effect, the repeated phoneme effect, the word-onset effect, and the syllable context constraint all have parallels within 5 percentage points in past studies of Indo-European languages.” Alderete concluded: “Cantonese seems to exhibit many of the same properties of speech errors documented in large corpora of English, Dutch, and German. These results can give us more confidence in these properties and the models designed to account for them.”

The research builds on the SFU Speech Error Database (SFUSED), the largest collection of spontaneous speech errors in any language worldwide. To learn about the methodology behind SFUSED, refer to “Investigating Perceptual Biases, Data Reliability, and Data Discovery in a Methodology for Collecting Speech Errors from Audio Recordings” by John Alderete and Monica Davies (2019). Alderete hopes that this large-data, multi-language corpus of materials will help increase consensus among scholars about how speech occurs in humans.

Portions of the data from the SFUSED related to specific articles have been released upon publication. Upon completion of the five-year SSHRC project, the SFUSED database for English and Cantonese will be released, together with free software to allow researchers to review the raw data collections. The full SFUSED will be released in the coming years as well. This data-rich, linguistically sophisticated resource will not only be relevant for linguists engaged in speech production research, but also applied areas such as language pedagogy, clinical linguistics, language and technology, and child language development.