Title: Speech production and speech errors: New directions using corpus and experimental methods in English and Cantonese
Funding: Social Sciences and Humanities Research Council (#435-2020-0193, 2020-2025)
Description: Speaking is a tremendously complex behaviour. It involves assigning concepts to words (lexical selection), retrieving the sounds inside words (phonological encoding), and expressing words with explicit motor actions (articulation). Despite wide acceptance of these distinct production processes, there is continued debate on what kinds of information these processes have access to and how they might interact through this information in speaking. The study of lexical properties, such as a word’s frequency or confusability with other words, has significant promise for advancing the debate over interactivity because lexical properties reveal how word-level information is accessed by these processes, and how interactions across the three production processes occur.
Understanding the interaction of processes in speech production is vital because speaking is a fundamental cognitive capacity in humans that underlies thinking, learning, and communicating. Nonetheless, a lack of consensus on the interactivity of speech production is in part due to insufficient research. Broad claims about interactivity have been based on small datasets, on contradictory results, and on data drawn almost exclusively from Indo-European languages. In particular, claims about speech production have tended to rely on conceptual theorizing, rather than building from detailed and cross-validated empirical results.
This study will investigate the impact of lexical factors on speech production through the scientific study of speech errors, or slips of tongue. Fifty years of research has shown that speech errors are a methodologically sound way of studying speech production because they naturally reflect the underlying mechanisms in normal language production. Preliminary research has found important effects of lexical properties on speech error patterns that suggest interactions weave through all three production processes. In two established labs at Simon Fraser University (SFU) and the University of Oregon, we will address existing empirical lacunae by investigating large datasets of speech errors from two typologically distinct languages, English and Cantonese.
The research builds on the SFU Speech Error Database (SFUSED), the largest collection of speech errors in any language worldwide. Twin databases for English and Cantonese will be extended to include a lexicon and syllabary for each language, supporting the use of data science methods designed to examine the impact of lexical properties. SFUSED is also based on audio recordings, allowing investigation of how lexical factors affect articulation in fine-grained phonetic patterns. Experimental methods will also be used to elicit speech errors arising from the three production processes to give an independent test of the corpus findings. This two-pronged approach will assess the degree and nature of interactivity using precise and reliable methods, and will produce a large-data, multi-language corpus of materials to help increase consensus among scholars about how speech occurs in humans.
Results of the five-year study will be open access through a variety of means. Upon project completion, our team will release the SFUSED database for both languages on a website specifically designed to explain the results and methods, together with free software to allow researchers to review the raw data collections. We will also organize a Dissemination Workshop where the team will meet with three leaders in the field to translate research insights from our data-rich and linguistically sophisticated resource on speech production into relevant applied areas, including language pedagogy, clinical linguistics, language and technology, and child language development.