Back to Search Start Over

Neurophysiological indices of audiovisual speech integration are enhanced at the phonetic level for speech in noise

Authors :
Aisling E. O’Sullivan
Giovanni M. Di Liberto
Edmund C. Lalor
Michael J. Crosse
Alain de Cheveigné
Trinity College Dublin
Albert Einstein College of Medicine [New York]
Ecole Normale Supérieure
Université Marien Ngouabi
Centre National de la Recherche Scientifique (CNRS)
Département d'Etudes Cognitives - ENS Paris (DEC)
École normale supérieure - Paris (ENS Paris)
Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)
University of Rochester [USA]
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

Seeing a speaker’s face benefits speech comprehension, especially in challenging listening conditions. This perceptual benefit is thought to stem from the neural integration of visual and auditory speech at multiple stages of processing, whereby movement of a speaker’s face provides temporal cues to auditory cortex, and articulatory information from the speaker’s mouth can aid recognizing specific linguistic units (e.g., phonemes, syllables). However it remains unclear how the integration of these cues varies as a function of listening conditions. Here we sought to provide insight on these questions by examining EEG responses to natural audiovisual, audio, and visual speech in quiet and in noise. Specifically, we represented our speech stimuli in terms of their spectrograms and their phonetic features, and then quantified the strength of the encoding of those features in the EEG using canonical correlation analysis. The encoding of both spectrotemporal and phonetic features was shown to be more robust in audiovisual speech responses then what would have been expected from the summation of the audio and visual speech responses, consistent with the literature on multisensory integration. Furthermore, the strength of this multisensory enhancement was more pronounced at the level of phonetic processing for speech in noise relative to speech in quiet, indicating that listeners rely more on articulatory details from visual speech in challenging listening conditions. These findings support the notion that the integration of audio and visual speech is a flexible, multistage process that adapts to optimize comprehension based on the current listening conditions.Significance StatementDuring conversation, visual cues impact our perception of speech. Integration of auditory and visual speech is thought to occur at multiple stages of speech processing and vary flexibly depending on the listening conditions. Here we examine audiovisual integration at two stages of speech processing using the speech spectrogram and a phonetic representation, and test how audiovisual integration adapts to degraded listening conditions. We find significant integration at both of these stages regardless of listening conditions, and when the speech is noisy, we find enhanced integration at the phonetic stage of processing. These findings provide support for the multistage integration framework and demonstrate its flexibility in terms of a greater reliance on visual articulatory information in challenging listening conditions.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....5e716e7a289070974f20d7b28e226b9f