Simon Fraser University Speech Error Database

A large and richly coded database of spontaneous speech errors collected from audio recordings.

Current research questions

How does methodology affect speech error patterns in large data collections? 

Alderete & Davies (2019, Language and Speech): collection methods have a large impact on patterns in natural speech. 

How is speech production shaped by grammar?

Alderete  & Tupper (2018, WIREs Cognitive Science): grammar is important in many language production models, but speech errors actually produce more phonologically illicit forms that previously thought. 

How is tone encoded?

Alderete, Chan, Yeung (2019, Cognition): in Cantonese, tone appears to be encoded like segments and tone is selected by an activation dynamics in phonological encoding.

What does the phonetic structure of speech errors tell us about language production processes?

Alderete, Baese-Berk, Leung and Goldrick (2020, Cognition): spontaneously produced sound errors are skewed towards unselected targets, supporting cascading activation in speech processing.

What insights can be gained from investigating speech errors cross-linguistically and in under-studied languages?

Alderete 2020 (2020 LabPhon17): the sub-lexical errors in a large collection of Cantonese speech errors confirm many known psycholinguistic trends, but also document new ones related to syllable encoding and moraic units. 

Data releases

Spreadsheet with 432 tone slips from SFUSED Cantonese 1.0, data release for Alderete, John, Queenie Chan, and Henny Yeung. 2019. Tone slips in Cantonese: Evidence for early phonological encoding. Cognition 191,

OSF project page with all of the data and scripts for data analysis for Alderete, Baese-Berk, Leung, and Goldrick 2020; contains VOT data on speech errors and matched correct word productions.

OSF project page with 2,245 sublexical errors from SFUSED Cantonese; that is, all of the data for Alderete 2022

OSF project page with 1,208 lexical substitution and word blend errors from SFUSED English; that is, all of the data from Alderete, Baese-Berk, Brasoveanu, and Law 2023.