Descriptor: "Audiovisual speech" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Audiovisual speech"' showing total 659 results

Start Over Descriptor "Audiovisual speech"

659 results on '"Audiovisual speech"'

1. Brain activation patterns in normal hearing adults: An fNIRS Study using an adapted clinical speech comprehension task

Author: Bálint, András, Wimmer, Wilhelm, Caversaccio, Marco, Rummel, Christian, and Weder, Stefan
Published: 2025
Full Text: View/download PDF

2. Audiovisual Perception of Lexical Stress: Beat Gestures and Articulatory Cues.

Author: Bujok, Ronny, Meyer, Antje S., and Bosker, Hans Rutger
Abstract: Human communication is inherently multimodal. Auditory speech, but also visual cues can be used to understand another talker. Most studies of audiovisual speech perception have focused on the perception of speech segments (i.e., speech sounds). However, less is known about the influence of visual information on the perception of suprasegmental aspects of speech like lexical stress. In two experiments, we investigated the influence of different visual cues (e.g., facial articulatory cues and beat gestures) on the audiovisual perception of lexical stress. We presented auditory lexical stress continua of disyllabic Dutch stress pairs together with videos of a speaker producing stress on the first or second syllable (e.g., articulating VOORnaam or voorNAAM). Moreover, we combined and fully crossed the face of the speaker producing lexical stress on either syllable with a gesturing body producing a beat gesture on either the first or second syllable. Results showed that people successfully used visual articulatory cues to stress in muted videos. However, in audiovisual conditions, we were not able to find an effect of visual articulatory cues. In contrast, we found that the temporal alignment of beat gestures with speech robustly influenced participants' perception of lexical stress. These results highlight the importance of considering suprasegmental aspects of language in multimodal contexts. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

3. Early selective attention to the articulating mouth as a potential female-specific marker of better language development in autism: a review.

Author: Lozano, Itziar, Viktorsson, Charlotte, Capelli, Elena, Gliga, Teodora, Riva, Valentina, and Tomalski, Przemysław
Abstract: Autism is a neurodevelopmental condition with early onset, usually entailing language differences compared to neurotypical peers. Females are four times less likely than males to be diagnosed with autism, and the language features associated with this condition are less frequent in females than in males. However, the developmental mechanisms underlying these sex differences remain unclear. In neurotypical populations, sex differences in language development are also observable from early on, with females outperforming males. One mechanism underlying these sex differences may be early differences in selective attention to talking faces. During the first year, more mouth-looking generally predicts better language development, but sex differences exist. Female infants look at the mouth of a talking face more than males without penalizing looking to the eyes, and reduced mouth-looking in early infancy relates to better vocabulary in toddlerhood only in females. In this hypothesis and theory article, we propose that unique female gaze patterns to the mouth may constitute an early female-specific candidate marker that acts as a protective marker for language development also in autism. Since autism is highly heritable, investigating infants at elevated likelihood for autism offers the opportunity to search for sex-specific markers operating early in life before autistic features and language differences emerge. We argue that, as in neurotypical female infants, mouth-looking may also protect female infants-at-elevated-likelihood-for-autism population from potential later differences in language skills. If so, then sex-specific early behavioral markers, potentially acting as protective markers of language, may compensate for some genetic risk markers affecting this population. Here we gather evidence from neurotypical infants and those with elevated likelihood of autism to uncover why biological sex, the development of selective attention to the mouth, and language acquisition could be intimately related in both populations. We also propose hypotheses regarding potential sex-differentiated neurodevelopmental pathways. We end discussing future research challenges: how generalizable mouth-looking could be as a potential female-specific early language marker across contexts (experimental vs. real life), countries, and developmental time. Ultimately, we aim to target a novel protective candidate of language acquisition, informing tailored interventions that consider sex as an important source of individual variability. [ABSTRACT FROM AUTHOR]
Published: 2025
Full Text: View/download PDF

4. Mask-wearing affects infants' selective attention to familiar and unfamiliar audiovisual speech

Author: Lauren N. Slivka, Kenna R. H. Clayton, and Greg D. Reynolds
Subjects: infancy, audiovisual speech, mask-wearing, selective attention, eye tracking, Psychology, BF1-990
Abstract: This study examined the immediate effects of mask-wearing on infant selective visual attention to audiovisual speech in familiar and unfamiliar languages. Infants distribute their selective attention to regions of a speaker's face differentially based on their age and language experience. However, the potential impact wearing a face mask may have on infants' selective attention to audiovisual speech has not been systematically studied. We utilized eye tracking to examine the proportion of infant looking time to the eyes and mouth of a masked or unmasked actress speaking in a familiar or unfamiliar language. Six-month-old and 12-month-old infants (n = 42, 55% female, 91% White Non-Hispanic/Latino) were shown videos of an actress speaking in a familiar language (English) with and without a mask on, as well as videos of the same actress speaking in an unfamiliar language (German) with and without a mask. Overall, infants spent more time looking at the unmasked presentations compared to the masked presentations. Regardless of language familiarity or age, infants spent more time looking at the mouth area of an unmasked speaker and they spent more time looking at the eyes of a masked speaker. These findings indicate mask-wearing has immediate effects on the distribution of infant selective attention to different areas of the face of a speaker during audiovisual speech.
Published: 2025
Full Text: View/download PDF

5. The effect of visual speech cues on neural tracking of speech in 10‐month‐old infants.

Author: Çetinçelik, Melis, Jordan‐Barros, Antonia, Rowland, Caroline F., and Snijders, Tineke M.
Subjects: *STRAINS & stresses (Mechanics), *LANGUAGE acquisition, *SPEECH, *AUDITORY perception, *CONTINUOUS processing
Abstract: While infants' sensitivity to visual speech cues and the benefit of these cues have been well‐established by behavioural studies, there is little evidence on the effect of visual speech cues on infants' neural processing of continuous auditory speech. In this study, we investigated whether visual speech cues, such as the movements of the lips, jaw, and larynx, facilitate infants' neural speech tracking. Ten‐month‐old Dutch‐learning infants watched videos of a speaker reciting passages in infant‐directed speech while electroencephalography (EEG) was recorded. In the videos, either the full face of the speaker was displayed or the speaker's mouth and jaw were masked with a block, obstructing the visual speech cues. To assess neural tracking, speech‐brain coherence (SBC) was calculated, focusing particularly on the stress and syllabic rates (1–1.75 and 2.5–3.5 Hz respectively in our stimuli). First, overall, SBC was compared to surrogate data, and then, differences in SBC in the two conditions were tested at the frequencies of interest. Our results indicated that infants show significant tracking at both stress and syllabic rates. However, no differences were identified between the two conditions, meaning that infants' neural tracking was not modulated further by the presence of visual speech cues. Furthermore, we demonstrated that infants' neural tracking of low‐frequency information is related to their subsequent vocabulary development at 18 months. Overall, this study provides evidence that infants' neural tracking of speech is not necessarily impaired when visual speech cues are not fully visible and that neural tracking may be a potential mechanism in successful language acquisition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Early audiovisual language discrimination: Monolingual and bilingual infants' differences in language switch detection.

Author: Birulés, Joan, Pons, Ferran, and Bosch, Laura
Subjects: *DISCRIMINATORY language, *CODE switching (Linguistics), *NATIVE language, *SPEECH, *AUDIOVISUAL presentations, *BILINGUALISM
Abstract: Successful language learning in bilinguals requires the differentiation of two language systems. The capacity to discriminate rhythmically close languages has been reported in 4-month-olds using auditory-only stimuli. This research offers a novel perspective on early language discrimination using audiovisual material. Monolingual and bilingual infants were first habituated to a face talking in the participants' native language (or the more frequent language in bilingual contexts) and then tested on two successive language switches by the same speaker, with a close and a distant language. Code-switching exposure was indexed from parental questionnaires. Results revealed that while monolinguals could detect both the close- and distant-language switch, bilinguals only reacted to the distant language, regardless of home code-switching experience. In the temporal dimension, the analyses showed that language switch detection required at least 10 s, suggesting that the audiovisual presentation (here the same speaker switching languages) slowed down or even hindered the language switch detection. These results suggest that the detection of a multimodal close-language switch is a challenging task, especially for bilingual infants exposed to phonologically and rhythmically close languages. The current research sets the ground for further studies exploring the role of indexical cues and selective attention processes on language switch detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Time Course of Attention to a Talker's Mouth in Monolingual and Close-Language Bilingual Children.

Author: Birulés, Joan, Bosch, Laura, Lewkowicz, David J., and Pons, Ferran
Subjects: *BRAIN physiology, *PHONOLOGICAL awareness, *ANALYSIS of variance, *MULTILINGUALISM, *AUDITORY perception, *LIPREADING, *LANGUAGE & languages, *AUDIOVISUAL materials, *FACIAL expression, *SPEECH evaluation, *EYE, *ATTENTION, *VISUAL perception, *COMMUNICATION, *QUESTIONNAIRES, *DESCRIPTIVE statistics, *MOUTH, *SPEECH, *CHILDREN
Abstract: We presented 28 Spanish monolingual and 28 Catalan–Spanish close-language bilingual 5-year-old children with a video of a talker speaking in the children's native language and a nonnative language and examined the temporal dynamics of their selective attention to the talker's eyes and mouth. When the talker spoke in the children's native language, monolinguals attended equally to the eyes and mouth throughout the trial, whereas close-language bilinguals first attended more to the mouth and then distributed attention equally between the eyes and mouth. In contrast, when the talker spoke in a nonnative language (English), both monolinguals and bilinguals initially attended more to the mouth and then gradually shifted to a pattern of equal attention to the eyes and mouth. These results indicate that specific early linguistic experience has differential effects on young children's deployment of selective attention to areas of a talker's face during the initial part of an audiovisual utterance. Public Significance Statement: This study shows that selective attention to a talker's face is a temporally dynamic process that depends on prior linguistic experience. Close-language 5-year-old bilingual children exhibited greater initial attention to a talker's mouth than did monolingual children. This suggests that regular and continuous experience with two close languages modulates how audiovisual speech cues are exploited, primarily at the start of communicative bouts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario.

Author: Ahmed, Farhin, Nidiffer, Aaron R., and Lalor, Edmund C.
Subjects: GAZE, COCKTAIL parties, ELECTROENCEPHALOGRAPHY, PERIPHERAL vision, SELECTIVITY (Psychology), SPEECH
Abstract: Seeing the speaker’s face greatly improves our speech comprehension in noisy environments. This is due to the brain’s ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers–an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person’s gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model – one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker’s face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

9. Segmenting Speech by Mouth: The Role of Oral Prosodic Cues for Visual Speech Segmentation.

Author: Mitchel, Aaron D., Lusk, Laina G., Wellington, Ian, and Mook, Alexis T.
Subjects: *SPEECH perception, *EXPERIMENTAL design, *STATISTICS, *EYE movements, *PHONOLOGICAL awareness, *MOTION pictures, *CONFIDENCE intervals, *ONE-way analysis of variance, *FACIAL expression, *TASK performance, *AUDIOVISUAL materials, *LANGUAGE acquisition, *UNDERGRADUATES, *COMPARATIVE studies, *T-test (Statistics), *RESEARCH funding, *UNIVERSITIES & colleges, *DESCRIPTIVE statistics, *STATISTICAL sampling, *DATA analysis software, *DATA analysis, *SPEECH, *MOUTH, *PROMPTS (Psychology)
Abstract: Adults are able to use visual prosodic cues in the speaker's face to segment speech. Furthermore, eye-tracking data suggest that learners will shift their gaze to the mouth during visual speech segmentation. Although these findings suggest that the mouth may be viewed more than the eyes or nose during visual speech segmentation, no study has examined the direct functional importance of individual features; thus, it is unclear which visual prosodic cues are important for word segmentation. In this study, we examined the impact of first removing (Experiment 1) and then isolating (Experiment 2) individual facial features on visual speech segmentation. Segmentation performance was above chance in all conditions except for when the visual display was restricted to the eye region (eyes only condition in Experiment 2). This suggests that participants were able to segment speech when they could visually access the mouth but not when the mouth was completely removed from the visual display, providing evidence that visual prosodic cues conveyed by the mouth are sufficient and likely necessary for visual speech segmentation. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

10. The effect of gaze on EEG measures of multisensory integration in a cocktail party scenario

Author: Farhin Ahmed, Aaron R. Nidiffer, and Edmund C. Lalor
Subjects: multisensory integration, selective attention, cocktail party problem, audiovisual speech, EEG, speech envelope, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: Seeing the speaker’s face greatly improves our speech comprehension in noisy environments. This is due to the brain’s ability to combine the auditory and the visual information around us, a process known as multisensory integration. Selective attention also strongly influences what we comprehend in scenarios with multiple speakers–an effect known as the cocktail-party phenomenon. However, the interaction between attention and multisensory integration is not fully understood, especially when it comes to natural, continuous speech. In a recent electroencephalography (EEG) study, we explored this issue and showed that multisensory integration is enhanced when an audiovisual speaker is attended compared to when that speaker is unattended. Here, we extend that work to investigate how this interaction varies depending on a person’s gaze behavior, which affects the quality of the visual information they have access to. To do so, we recorded EEG from 31 healthy adults as they performed selective attention tasks in several paradigms involving two concurrently presented audiovisual speakers. We then modeled how the recorded EEG related to the audio speech (envelope) of the presented speakers. Crucially, we compared two classes of model – one that assumed underlying multisensory integration (AV) versus another that assumed two independent unisensory audio and visual processes (A+V). This comparison revealed evidence of strong attentional effects on multisensory integration when participants were looking directly at the face of an audiovisual speaker. This effect was not apparent when the speaker’s face was in the peripheral vision of the participants. Overall, our findings suggest a strong influence of attention on multisensory integration when high fidelity visual (articulatory) speech information is available. More generally, this suggests that the interplay between attention and multisensory integration during natural audiovisual speech is dynamic and is adaptable based on the specific task and environment.
Published: 2023
Full Text: View/download PDF

11. Primacy of mouth over eyes to perceive audiovisual Mandarin lexical tones

Author: Biao Zeng, Guoxing Yu, Nabil Hasshim, and Shanhu Hong
Subjects: lexical tone, eye movement, gaze, audiovisual speech, Chinese speaker, English speaker, Human anatomy, QM1-695
Abstract: The visual cues of lexical tones are more implicit and much less investigated than consonants and vowels, and it is still unclear what facial areas contribute to facial tones identification. This study investigated Chinese and English speakers’ eye movements when they were asked to identify audiovisual Mandarin lexical tones. The Chinese and English speakers were presented with an audiovisual clip of Mandarin monosyllables (for instance, /ă/, /à/, /ĭ/, /ì/) and were asked to identify whether the syllables were a dipping tone (/ă/, / ĭ/) or a falling tone (/ à/, /ì/). These audiovisual syllables were presented in clear, noisy and silent (absence of audio signal) conditions. An eye-tracker recorded the participants’ eye movements. Results showed that the participants gazed more at the mouth than the eyes. In addition, when acoustic conditions became adverse, both the Chinese and English speakers increased their gaze duration at the mouth rather than at the eyes. The findings suggested that the mouth is the primary area that listeners utilise in their perception of audiovisual lexical tones. The similar eye movements between the Chinese and English speakers imply that the mouth acts as a perceptual cue that provides articulatory information, as opposed to social and pragmatic information.
Published: 2023
Full Text: View/download PDF

12. Neural correlates of audiovisual speech synchrony perception and its relationship with autistic traits.

Author: Zhou, Han‐yu, Zhang, Yi‐jing, Hu, Hui‐xin, Yan, Yong‐jie, Wang, Ling‐ling, Lui, Simon S. Y., and Chan, Raymond C. K.
Subjects: *SPEECH perception, *CINGULATE cortex, *AUTISM spectrum disorders, *YOUNG adults, *VISUAL perception
Abstract: The anterior insula (AI) has the central role in coordinating attention and integrating information from multiple sensory modalities. AI dysfunction may contribute to both sensory and social impairments in autism spectrum disorder (ASD). Little is known regarding the brain mechanisms that guide multisensory integration, and how such neural activity might be affected by autistic‐like symptoms in the general population. In this study, 72 healthy young adults performed an audiovisual speech synchrony judgment (SJ) task during fMRI scanning. We aimed to investigate the SJ‐related brain activations and connectivity, with a focus on the AI. Compared with synchronous speech, asynchrony perception triggered stronger activations in the bilateral AI, and other frontal‐cingulate‐parietal regions. In contrast, synchronous perception resulted in greater involvement of the primary auditory and visual areas, indicating multisensory validation and fusion. Moreover, the AI demonstrated a stronger connection with the anterior cingulate gyrus (ACC) in the audiovisual asynchronous (vs. synchronous) condition. To facilitate asynchrony detection, the AI may integrate auditory and visual speech stimuli, and generate a control signal to the ACC that further supports conflict‐resolving and response selection. Correlation analysis, however, suggested that audiovisual synchrony perception and its related AI activation and connectivity did not significantly vary with different levels of autistic traits. These findings provide novel evidence for the neural mechanisms underlying multisensory temporal processing in healthy people. Future research should examine whether such findings would be extended to ASD patients. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

13. Benefit of visual speech information for word comprehension in post-stroke aphasia.

Author: Krason, Anna, Vigliocco, Gabriella, Mailend, Marja-Liisa, Stoll, Harrison, Varley, Rosemary, and Buxbaum, Laurel J.
Subjects: LIPREADING, APHASIA, LANGUAGE & languages, INFORMATION retrieval, SPEECH perception
Published: 2023
Full Text: View/download PDF

14. Adaptive Plasticity in Perceiving Speech Sounds

Author: Ullas, Shruti, Bonte, Milene, Formisano, Elia, Vroomen, Jean, Fay, Richard R., Series Editor, Popper, Arthur N., Series Editor, Avraham, Karen, Editorial Board Member, Bass, Andrew, Editorial Board Member, Cunningham, Lisa, Editorial Board Member, Fritzsch, Bernd, Editorial Board Member, Groves, Andrew, Editorial Board Member, Hertzano, Ronna, Editorial Board Member, Le Prell, Colleen, Editorial Board Member, Litovsky, Ruth, Editorial Board Member, Manis, Paul, Editorial Board Member, Manley, Geoffrey, Editorial Board Member, Moore, Brian, Editorial Board Member, Simmons, Andrea, Editorial Board Member, Yost, William, Editorial Board Member, Holt, Lori L., editor, Peelle, Jonathan E., editor, and Coffin, Allison B., editor
Published: 2022
Full Text: View/download PDF

15. Limitations of Audiovisual Speech on Robots for Second Language Pronunciation Learning.

Author: Saya Amioka, Janssens, Ruben, Wolfert, Pieter, Qiaoqiao Ren, Pinto Bernal, Maria Jose, and Belpaeme, Tony
Subjects: SECOND language acquisition, SPEECH, AUTOMATIC speech recognition, SPEECH perception, SOCIAL robots, DEAF children, NATIVE language
Abstract: The perception of audiovisual speech plays an important role in infants' first language acquisition and continues to be important for language understanding beyond infancy. Beyond that, the perception of speech and congruent lip motion supports language understanding for adults, and it has been suggested that second language learning benefits from audiovisual speech, as it helps learners distinguish speech sounds in the target language. In this paper, we study whether congruent audiovisual speech on a robot facilitates the learning of Japanese pronunciation. 27 native-Dutch speaking participants were trained in Japanese pronunciation by a social robot. The robot demonstrated 30 Japanese words of varying complexity using either congruent audiovisual speech, incongruent visual speech, or computer-generated audiovisual speech. Participants were asked to imitate the robot's pronunciation, recordings of which were rated by native Japanese speakers. Against expectation, the results showed that congruent audiovisual speech resulted in lower pronunciation performance than low-fidelity or incongruent speech. We show that our learners, being native Dutch speakers, are only very weakly sensitive to audiovisual Japanese speech which possibly explains why learning performance does not seem to benefit from audiovisual speech. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

16. Visually biased Perception in Cochlear Implant Users: A Study of the McGurk and Sound-Induced Flash Illusions.

Author: Butera, Iliza M., Stevenson, Ryan A., Gifford, René H., and Wallace, Mark T.
Subjects: COCHLEAR implants, SPEECH perception, VISUAL perception, PERCEPTUAL illusions, RESEARCH funding, PROMPTS (Psychology)
Abstract: The reduction in spectral resolution by cochlear implants oftentimes requires complementary visual speech cues to facilitate understanding. Despite substantial clinical characterization of auditory-only speech measures, relatively little is known about the audiovisual (AV) integrative abilities that most cochlear implant (CI) users rely on for daily speech comprehension. In this study, we tested AV integration in 63 CI users and 69 normal-hearing (NH) controls using the McGurk and sound-induced flash illusions. To our knowledge, this study is the largest to-date measuring the McGurk effect in this population and the first that tests the sound-induced flash illusion (SIFI). When presented with conflicting AV speech stimuli (i.e., the phoneme "ba" dubbed onto the viseme "ga"), we found that 55 CI users (87%) reported a fused percept of "da" or "tha" on at least one trial. After applying an error correction based on unisensory responses, we found that among those susceptible to the illusion, CI users experienced lower fusion than controls—a result that was concordant with results from the SIFI where the pairing of a single circle flashing on the screen with multiple beeps resulted in fewer illusory flashes for CI users. While illusion perception in these two tasks appears to be uncorrelated among CI users, we identified a negative correlation in the NH group. Because neither illusion appears to provide further explanation of variability in CI outcome measures, further research is needed to determine how these findings relate to CI users' speech understanding, particularly in ecological listening conditions that are naturally multisensory. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

17. Deficient Audiovisual Speech Perception in Schizophrenia: An ERP Study.

Author: Ghaneirad, Erfan, Saenger, Ellyn, Szycik, Gregor R., Čuš, Anja, Möde, Laura, Sinke, Christopher, Wiswede, Daniel, Bleich, Stefan, and Borgolte, Anna
Subjects: *SPEECH perception, *EVOKED potentials (Electrophysiology), *AUDITORY perception, *SPEECH, *SCHIZOPHRENIA
Abstract: In everyday verbal communication, auditory speech perception is often disturbed by background noise. Especially in disadvantageous hearing conditions, additional visual articulatory information (e.g., lip movement) can positively contribute to speech comprehension. Patients with schizophrenia (SZs) demonstrate an aberrant ability to integrate visual and auditory sensory input during speech perception. Current findings about underlying neural mechanisms of this deficit are inconsistent. Particularly and despite the importance of early sensory processing in speech perception, very few studies have addressed these processes in SZs. Thus, in the present study, we examined 20 adult subjects with SZ and 21 healthy controls (HCs) while presenting audiovisual spoken words (disyllabic nouns) either superimposed by white noise (−12 dB signal-to-noise ratio) or not. In addition to behavioral data, event-related brain potentials (ERPs) were recorded. Our results demonstrate reduced speech comprehension for SZs compared to HCs under noisy conditions. Moreover, we found altered N1 amplitudes in SZ during speech perception, while P2 amplitudes and the N1-P2 complex were similar to HCs, indicating that there may be disturbances in multimodal speech perception at an early stage of processing, which may be due to deficits in auditory speech perception. Moreover, a positive relationship between fronto-central N1 amplitudes and the positive subscale of the Positive and Negative Syndrome Scale (PANSS) has been observed. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

18. Metacognition and Causal Inference in Audiovisual Speech.

Author: Kimmet, Faith, Pedersen, Samantha, Cardenas, Victoria, Rubiera, Camila, Johnson, Grey, Sans, Addison, Baldwin, Matthew, and Odegaard, Brian
Subjects: *CAUSAL inference, *METACOGNITION, *JUDGMENT (Psychology), *LEGAL judgments, *STIMULUS & response (Psychology)
Abstract: In multisensory environments, our brains perform causal inference to estimate which sources produce specific sensory signals. Decades of research have revealed the dynamics which underlie this process of causal inference for multisensory (audiovisual) signals, including how temporal, spatial, and semantic relationships between stimuli influence the brain's decision about whether to integrate or segregate. However, presently, very little is known about the relationship between metacognition and multisensory integration, and the characteristics of perceptual confidence for audiovisual signals. In this investigation, we ask two questions about the relationship between metacognition and multisensory causal inference: are observers' confidence ratings for judgments about Congruent, McGurk, and Rarely Integrated speech similar, or different? And do confidence judgments distinguish between these three scenarios when the perceived syllable is identical? To answer these questions, 92 online participants completed experiments where on each trial, participants reported which syllable they perceived, and rated confidence in their judgment. Results from Experiment 1 showed that confidence ratings were quite similar across Congruent speech, McGurk speech, and Rarely Integrated speech. In Experiment 2, when the perceived syllable for congruent and McGurk videos was matched, confidence scores were higher for congruent stimuli compared to McGurk stimuli. In Experiment 3, when the perceived syllable was matched between McGurk and Rarely Integrated stimuli, confidence judgments were similar between the two conditions. Together, these results provide evidence of the capacities and limitations of metacognition's ability to distinguish between different sources of multisensory information. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

19. Incongruent visual cues affect the perception of Mandarin vowel but not tone.

Author: Shanhu Hong, Rui Wang, and Biao Zeng
Subjects: TONE (Phonetics), VOWELS, ABSOLUTE pitch, AUDITORY perception, SPEECH, CHINESE language
Abstract: Over the recent few decades, a large number of audiovisual speech studies have been focusing on the visual cues of consonants and vowels but neglecting those relating to lexical tones. In this study, we investigate whether incongruent audiovisual information interfered with the perception of lexical tones. We found that, for both Chinese and English speakers, incongruence between auditory and visemic mouth shape (i.e., visual form information) significantly interfered with reaction time and reduced the identification accuracy of vowels. However, incongruent lip movements (i.e., visual timing information) did not interfere with the perception of auditory lexical tone. We conclude that, in contrast to vowel perception, auditory tone perception seems relatively impervious to visual congruence cues, at least under these restricted laboratory conditions. The salience of visual form and timing information is discussed based on this finding. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

20. Increases in sensory noise predict attentional disruptions to audiovisual speech perception.

Author: Fisher, Victoria L., Dean, Cassandra L., Nave, Claire S., Parkins, Emma V., Kerkhoff, Willa G., and Kwakye, Leslie D.
Subjects: SPEECH perception, NOISE, DUAL-task paradigm, SENSORIMOTOR integration, NEUROLOGICAL disorders
Abstract: We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

21. Developmental change in children's speech processing of auditory and visual cues: An eyetracking study.

Author: ZAMUNER, Tania S., RABIDEAU, Theresa, MCDONALD, Margarethe, and YEUNG, H. Henny
Abstract: This study investigates how children aged two to eight years (N = 129) and adults (N = 29) use auditory and visual speech for word recognition. The goal was to bridge the gap between apparent successes of visual speech processing in young children in visual-looking tasks, with apparent difficulties of speech processing in older children from explicit behavioural measures. Participants were presented with familiar words in audio-visual (AV), audio-only (A-only) or visual-only (V-only) speech modalities, then presented with target and distractor images, and looking to targets was measured. Adults showed high accuracy, with slightly less target-image looking in the V-only modality. Developmentally, looking was above chance for both AV and A-only modalities, but not in the V-only modality until 6 years of age (earlier on /k/-initial words). Flexible use of visual cues for lexical access develops throughout childhood. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

22. Does Visual Self-Supervision Improve Learning of Speech Representations for Emotion Recognition?

Author: Shukla, Abhinav, Petridis, Stavros, and Pantic, Maja
Abstract: Self-supervised learning has attracted plenty of recent research interest. However, most works for self-supervision in speech are typically unimodal and there has been limited work that studies the interaction between audio and visual modalities for cross-modal self-supervision. This article (1) investigates visual self-supervision via face reconstruction to guide the learning of audio representations; (2) proposes an audio-only self-supervision approach for speech representation learning; (3) shows that a multi-task combination of the proposed visual and audio self-supervision is beneficial for learning richer features that are more robust in noisy conditions; (4) shows that self-supervised pretraining can outperform fully supervised training and is especially useful to prevent overfitting on smaller sized datasets. We evaluate our learned audio representations for discrete emotion recognition, continuous affect recognition and automatic speech recognition. We outperform existing self-supervised methods for all tested downstream tasks. Our results demonstrate the potential of visual self-supervision for audio feature learning and suggest that joint visual and audio self-supervision leads to more informative audio representations for speech and emotion recognition. [ABSTRACT FROM AUTHOR]
Published: 2023
Full Text: View/download PDF

23. Neural Speech Tracking Contribution of Lip Movements Predicts Behavioral Deterioration When the Speaker's Mouth Is Occluded.

Author: Reisinger P, Gillis M, Suess N, Vanthornhout J, Haider CL, Hartmann T, Hauswald A, Schwarz K, Francart T, and Weisz N
Subjects: Humans, Female, Male, Adult, Young Adult, Movement physiology, Speech physiology, Perceptual Masking physiology, Mouth physiology, Comprehension physiology, Acoustic Stimulation, Lip physiology, Speech Perception physiology, Magnetoencephalography
Abstract: Observing lip movements of a speaker facilitates speech understanding, especially in challenging listening situations. Converging evidence from neuroscientific studies shows stronger neural responses to audiovisual stimuli compared with audio-only stimuli. However, the interindividual variability of this contribution of lip movement information and its consequences on behavior are unknown. We analyzed source-localized magnetoencephalographic responses from 29 normal-hearing participants (12 females) listening to audiovisual speech, both with and without the speaker wearing a surgical face mask, and in the presence or absence of a distractor speaker. Using temporal response functions to quantify neural speech tracking, we show that neural responses to lip movements are, in general, enhanced when speech is challenging. After controlling for speech acoustics, we show that lip movements contribute to enhanced neural speech tracking, particularly when a distractor speaker is present. However, the extent of this visual contribution to neural speech tracking varied greatly among participants. Probing the behavioral relevance, we demonstrate that individuals who show a higher contribution of lip movements in terms of neural speech tracking show a stronger drop in comprehension and an increase in perceived difficulty when the mouth is occluded by a surgical face mask. In contrast, no effect was found when the mouth was not occluded. We provide novel insights on how the contribution of lip movements in terms of neural speech tracking varies among individuals and its behavioral relevance, revealing negative consequences when visual speech is absent. Our results also offer potential implications for objective assessments of audiovisual speech perception., Competing Interests: K.S. is an employee of MED-EL GmbH. All other authors declare no competing financial interests., (Copyright © 2025 Reisinger et al.)
Published: 2025
Full Text: View/download PDF

24. Infants' Preference for ID Speech in Face and Voice Extends to a Non-Native Language.

Author: Birulés J, Méary D, Fort M, Hojin K, Johnson SP, and Pascalis O
Subjects: Humans, Male, Infant, Female, Speech, Language, Face, Facial Recognition physiology, Speech Perception, Voice
Abstract: Infants prefer infant-directed (ID) speech. Concerning talking faces, previous research showed that 3- and 5-month-olds prefer faces that produce native ID than native adult-directed (AD) speech, regardless of background speech being ID, AD or silent. Here, we explored whether infants also show a preference for non-native ID speech. We presented 3- and 6-month-old infants with pairs of talking faces, one producing non-native ID speech and the other non-native AD speech, either in silence (Experiment 1) or accompanied by non-native ID or AD background speech (Experiment 2). Results from Experiment 1 showed an overall preference for the silent ID talking faces across both age groups, suggesting a reliance on cross-linguistic, potentially universal cues for this preference. However, Experiment 2 showed that preference for ID faces was disrupted at 3 months when auditory speech was present (ID or AD). At 6 months, infants maintained a preference for ID talking faces, but only when accompanied by ID speech. These findings show that auditory non-native speech interferes with infants' processing of ID talking faces. They also suggest that by 6 months, infants start associating ID features from faces and voices irrespective of language familiarity, suggesting that infants' ID preference may be universal and amodal., (© 2024 The Author(s). Infancy published by Wiley Periodicals LLC on behalf of International Congress of Infant Studies.)
Published: 2025
Full Text: View/download PDF

25. Increases in sensory noise predict attentional disruptions to audiovisual speech perception

Author: Victoria L. Fisher, Cassandra L. Dean, Claire S. Nave, Emma V. Parkins, Willa G. Kerkhoff, and Leslie D. Kwakye
Subjects: multisensory integration (MSI), attention, dual task, McGurk effect, perceptual load, audiovisual speech, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: We receive information about the world around us from multiple senses which combine in a process known as multisensory integration. Multisensory integration has been shown to be dependent on attention; however, the neural mechanisms underlying this effect are poorly understood. The current study investigates whether changes in sensory noise explain the effect of attention on multisensory integration and whether attentional modulations to multisensory integration occur via modality-specific mechanisms. A task based on the McGurk Illusion was used to measure multisensory integration while attention was manipulated via a concurrent auditory or visual task. Sensory noise was measured within modality based on variability in unisensory performance and was used to predict attentional changes to McGurk perception. Consistent with previous studies, reports of the McGurk illusion decreased when accompanied with a secondary task; however, this effect was stronger for the secondary visual (as opposed to auditory) task. While auditory noise was not influenced by either secondary task, visual noise increased with the addition of the secondary visual task specifically. Interestingly, visual noise accounted for significant variability in attentional disruptions to the McGurk illusion. Overall, these results strongly suggest that sensory noise may underlie attentional alterations to multisensory integration in a modality-specific manner. Future studies are needed to determine whether this finding generalizes to other types of multisensory integration and attentional manipulations. This line of research may inform future studies of attentional alterations to sensory processing in neurological disorders, such as Schizophrenia, Autism, and ADHD.
Published: 2023
Full Text: View/download PDF

26. Attention to audiovisual speech does not facilitate language acquisition in infants with familial history of autism.

Author: Chawarska, Katarzyna, Lewkowicz, David, Feiner, Hannah, Macari, Suzanne, and Vernetti, Angelina
Subjects: *DIAGNOSIS of autism, *GENETICS of autism, *SPEECH perception, *EYE movements, *MULTIVARIATE analysis, *TIME, *ONE-way analysis of variance, *AUDIOVISUAL materials, *REGRESSION analysis, *LANGUAGE acquisition, *PEARSON correlation (Statistics), *ATTENTION, *DESCRIPTIVE statistics, *DATA analysis software, *LANGUAGE disorders, *LONGITUDINAL method, *DISEASE risk factors, PHYSIOLOGICAL aspects of speech
Abstract: Background: Due to familial liability, siblings of children with ASD exhibit elevated risk for language delays. The processes contributing to language delays in this population remain unclear. Methods: Considering well‐established links between attention to dynamic audiovisual cues inherent in a speaker's face and speech processing, we investigated if attention to a speaker's face and mouth differs in 12‐month‐old infants at high familial risk for ASD but without ASD diagnosis (hr‐sib; n = 91) and in infants at low familial risk (lr‐sib; n = 62) for ASD and whether attention at 12 months predicts language outcomes at 18 months. Results: At 12 months, hr‐sib and lr‐sib infants did not differ in attention to face (p =.14), mouth preference (p =.30), or in receptive and expressive language scores (p =.36, p =.33). At 18 months, the hr‐sib infants had lower receptive (p =.01) but not expressive (p =.84) language scores than the lr‐sib infants. In the lr‐sib infants, greater attention to the face (p =.022) and a mouth preference (p =.025) contributed to better language outcomes at 18 months. In the hr‐sib infants, neither attention to the face nor a mouth preference was associated with language outcomes at 18 months. Conclusions: Unlike low‐risk infants, high‐risk infants do not appear to benefit from audiovisual prosodic and speech cues in the service of language acquisition despite intact attention to these cues. We propose that impaired processing of audiovisual cues may constitute the link between genetic risk factors and poor language outcomes observed across the autism risk spectrum and may represent a promising endophenotype in autism. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

27. Oscillatory correlates of linguistic prediction and modality effects during listening to auditory-only and audiovisual sentences.

Author: Brunellière, Angèle, Vincent, Marion, and Delrue, Laurence
Subjects: *MODALITY (Linguistics), *GRAMMATICAL gender, *LISTENING, *OSCILLATIONS, *NEUROLINGUISTICS
Abstract: In natural listening situations, understanding spoken sentences requires interactions between several multisensory to linguistic levels of information. In two electroencephalographical studies, we examined the neuronal oscillations of linguistic prediction produced by unimodal and bimodal sentence listening to observe how these brain correlates were affected by the sensory streams delivering linguistic information. Sentence contexts which were strongly predictive of a particular word were ended by a possessive adjective matching or not the gender of the predicted word. Alpha, beta and gamma oscillations were investigated as they were considered to play a crucial role in the predictive process. During the audiovisual or auditory-only listening to sentences, no evidence of word prediction was observed. In contrast, in a more challenging listening situation during which bimodal audiovisual streams switched to unimodal auditory stream, gamma power was sensitive to word prediction based on prior sentence context. Results suggest that prediction spreading from higher sentence levels to lower word levels is optional during unimodal and bimodal sentence listening and is observed when the listening situation is more challenging. Alpha and beta oscillations were found to decrease when semantically constraining sentences were delivered in the audiovisual modality in comparison with the auditory-only modality. Altogether, our findings bear major implications for our understanding of the neural mechanisms that support predictive processing in multimodal language comprehension. • No evidence of word prediction for audiovisual and auditory streams in optimal situations • Evidence of word prediction was related to gamma activity in challenging situations. • Alpha and beta oscillations suppressed in audiovisual speech. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

28. Auditory, Visual and Audiovisual Speech Processing Streams in Superior Temporal Sulcus

Author: Venezia, Jonathan H, Vaden, Kenneth I, Rong, Feng, Maddox, Dale, Saberi, Kourosh, and Hickok, Gregory
Subjects: Biological Psychology, Psychology, Clinical Research, Neurosciences, Mental health, audiovisual speech, superior temporal sulcus, fMRI, visual motion, functional gradient, Cognitive Sciences, Experimental Psychology, Biological psychology, Cognitive and computational psychology
Abstract: The human superior temporal sulcus (STS) is responsive to visual and auditory information, including sounds and facial cues during speech recognition. We investigated the functional organization of STS with respect to modality-specific and multimodal speech representations. Twenty younger adult participants were instructed to perform an oddball detection task and were presented with auditory, visual, and audiovisual speech stimuli, as well as auditory and visual nonspeech control stimuli in a block fMRI design. Consistent with a hypothesized anterior-posterior processing gradient in STS, auditory, visual and audiovisual stimuli produced the largest BOLD effects in anterior, posterior and middle STS (mSTS), respectively, based on whole-brain, linear mixed effects and principal component analyses. Notably, the mSTS exhibited preferential responses to multisensory stimulation, as well as speech compared to nonspeech. Within the mid-posterior and mSTS regions, response preferences changed gradually from visual, to multisensory, to auditory moving posterior to anterior. Post hoc analysis of visual regions in the posterior STS revealed that a single subregion bordering the mSTS was insensitive to differences in low-level motion kinematics yet distinguished between visual speech and nonspeech based on multi-voxel activation patterns. These results suggest that auditory and visual speech representations are elaborated gradually within anterior and posterior processing streams, respectively, and may be integrated within the mSTS, which is sensitive to more abstract speech information within and across presentation modalities. The spatial organization of STS is consistent with processing streams that are hypothesized to synthesize perceptual speech representations from sensory signals that provide convergent information from visual and auditory modalities.
Published: 2017

29. Early Word Segmentation Behind the Mask.

Author: Frota, Sónia, Pejovic, Jovana, Cruz, Marisa, Severino, Cátia, and Vigário, Marina
Subjects: AUDITORY masking, SPEECH, LANGUAGE ability, MEDICAL masks, SPEECH perception
Abstract: Infants have been shown to rely both on auditory and visual cues when processing speech. We investigated the impact of COVID-related changes, in particular of face masks, in early word segmentation abilities. Following up on our previous study demonstrating that, by 4 months, infants already segmented targets presented auditorily at utterance-edge position, and, using the same visual familiarization paradigm, 7–9-month-old infants performed an auditory and an audiovisual word segmentation experiment in two conditions: without and with an FFP2 face mask. Analysis of acoustic and visual cues showed changes in face-masked speech affecting the amount, weight, and location of cues. Utterance-edge position displayed more salient cues than utterance-medial position, but the cues were attenuated in face-masked speech. Results revealed no evidence for segmentation, not even at edge position, regardless of mask condition and auditory or visual speech presentation. However, in the audiovisual experiment, infants attended more to the screen during the test trials when familiarized with without mask speech. Also, the infants attended more to the mouth and less to the eyes in without mask than with mask. In addition, evidence for an advantage of the utterance-edge position in emerging segmentation abilities was found. Thus, audiovisual information provided some support to developing word segmentation. We compared 7–9-monthers segmentation ability observed in the Butler and Frota pre-COVID study with the current auditory without mask data. Mean looking time for edge was significantly higher than unfamiliar in the pre-COVID study only. Measures of cognitive and language development obtained with the CSBS scales showed that the infants of the current study scored significantly lower than the same-age infants from the CSBS (pre-COVID) normative data. Our results suggest an overall effect of the pandemic on early segmentation abilities and language development, calling for longitudinal studies to determine how development proceeds. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

30. Own-race faces promote integrated audiovisual speech information.

Author: Ujiie, Yuta and Takahashi, Kohske
Subjects: *EAST Asians, *SPEECH perception, *FACIAL expression & emotions (Psychology), *SNOEZELEN, *SELF-expression
Abstract: The other-race effect indicates a perceptual advantage when processing own-race faces. This effect has been demonstrated in individuals' recognition of facial identity and emotional expressions. However, it remains unclear whether the other-race effect also exists in multisensory domains. We conducted two experiments to provide evidence for the other-race effect in facial speech recognition, using the McGurk effect. Experiment 1 tested this issue among East Asian adults, examining the magnitude of the McGurk effect during stimuli using speakers from two different races (own-race vs. other-race). We found that own-race faces induced a stronger McGurk effect than other-race faces. Experiment 2 indicated that the other-race effect was not simply due to different levels of attention being paid to the mouths of own- and other-race speakers. Our findings demonstrated that own-race faces enhance the weight of visual input during audiovisual speech perception, and they provide evidence of the own-race effect in the audiovisual interaction for speech perception in adults. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

31. Perceptual plasticity in adverse listening conditions : factors affecting adaptation to accented and noise-vocoded speech

Author: Banks, Briony
Subjects: 401, Speech perception, Cognition, Audiovisual speech, Eye-tracking
Abstract: Adverse listening conditions can be a hindrance to communication, but humans are remarkably adept at overcoming them. Research has begun to uncover the cognitive and behavioural mechanisms behind this perceptual plasticity, but we still do not fully understand the reasons for variability in individual responses. The research reported in this thesis addressed several factors which would further this understanding. Study 1 examined the role of cognitive ability in recognition of, and perceptual adaptation to, accented speech. A measure of executive function predicted greater and more rapid perceptual adaptation. Vocabulary knowledge predicted overall recognition of the accented speech, and mediated the relationship between working memory and recognition accuracy. Study 2 compared recognition of, and perceptual adaptation to, accented speech with and without audiovisual cues. The presence of audiovisual cues improved recognition of the accented speech in noise, but not perceptual adaptation. Study 3 investigated when perceivers make use of visual speech cues during recognition of, and perceptual adaptation to, audiovisual noise-vocoded speech. Listeners’ eye gaze was analysed over time and related to their performance. The percentage and length of fixations on the speaker’s mouth increased during recognition of individual sentences, while the length of fixations on the mouth decreased as perceivers adapted to the noise-vocoded speech over the course of the experiment. Longer fixations on the speaker’s mouth were related to better speech recognition. Results demonstrate that perceptual plasticity of unfamiliar speech is driven by cognitive processes, but can also be modified by the modality of speech (audiovisual or audio-only). Behavioural responses, such as eye gaze, are also related to our ability to respond to adverse conditions. Speech recognition and perceptual adaptation were differentially related to the factors in each study and therefore likely reflect different processes; these measures should therefore both be considered in studies investigating listeners’ response to adverse conditions. Overall, the research adds to our understanding of the mechanisms and behaviours involved in perceptual plasticity in adverse listening conditions.
Published: 2016

32. Masking of the mouth area impairs reconstruction of acoustic speech features and higher-level segmentational features in the presence of a distractor speaker

Author: Chandra Leon Haider, Nina Suess, Anne Hauswald, Hyojin Park, and Nathan Weisz
Subjects: Stimulus reconstruction, Face masks, Audiovisual speech, Formants, Speech envelope, Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: Multisensory integration enables stimulus representation even when the sensory input in a single modality is weak. In the context of speech, when confronted with a degraded acoustic signal, congruent visual inputs promote comprehension. When this input is masked, speech comprehension consequently becomes more difficult. But it still remains inconclusive which levels of speech processing are affected under which circumstances by occluding the mouth area. To answer this question, we conducted an audiovisual (AV) multi-speaker experiment using naturalistic speech. In half of the trials, the target speaker wore a (surgical) face mask, while we measured the brain activity of normal hearing participants via magnetoencephalography (MEG). We additionally added a distractor speaker in half of the trials in order to create an ecologically difficult listening situation. A decoding model on the clear AV speech was trained and used to reconstruct crucial speech features in each condition. We found significant main effects of face masks on the reconstruction of acoustic features, such as the speech envelope and spectral speech features (i.e. pitch and formant frequencies), while reconstruction of higher level features of speech segmentation (phoneme and word onsets) were especially impaired through masks in difficult listening situations. As we used surgical face masks in our study, which only show mild effects on speech acoustics, we interpret our findings as the result of the missing visual input. Our findings extend previous behavioural results, by demonstrating the complex contextual effects of occluding relevant visual information on speech processing.
Published: 2022
Full Text: View/download PDF

33. Early Word Segmentation Behind the Mask

Author: Sónia Frota, Jovana Pejovic, Marisa Cruz, Cátia Severino, and Marina Vigário
Subjects: early word segmentation, face mask, COVID-19, auditory speech, audiovisual speech, speech perception, Psychology, BF1-990
Abstract: Infants have been shown to rely both on auditory and visual cues when processing speech. We investigated the impact of COVID-related changes, in particular of face masks, in early word segmentation abilities. Following up on our previous study demonstrating that, by 4 months, infants already segmented targets presented auditorily at utterance-edge position, and, using the same visual familiarization paradigm, 7–9-month-old infants performed an auditory and an audiovisual word segmentation experiment in two conditions: without and with an FFP2 face mask. Analysis of acoustic and visual cues showed changes in face-masked speech affecting the amount, weight, and location of cues. Utterance-edge position displayed more salient cues than utterance-medial position, but the cues were attenuated in face-masked speech. Results revealed no evidence for segmentation, not even at edge position, regardless of mask condition and auditory or visual speech presentation. However, in the audiovisual experiment, infants attended more to the screen during the test trials when familiarized with without mask speech. Also, the infants attended more to the mouth and less to the eyes in without mask than with mask. In addition, evidence for an advantage of the utterance-edge position in emerging segmentation abilities was found. Thus, audiovisual information provided some support to developing word segmentation. We compared 7–9-monthers segmentation ability observed in the Butler and Frota pre-COVID study with the current auditory without mask data. Mean looking time for edge was significantly higher than unfamiliar in the pre-COVID study only. Measures of cognitive and language development obtained with the CSBS scales showed that the infants of the current study scored significantly lower than the same-age infants from the CSBS (pre-COVID) normative data. Our results suggest an overall effect of the pandemic on early segmentation abilities and language development, calling for longitudinal studies to determine how development proceeds.
Published: 2022
Full Text: View/download PDF

34. Visibility of speech articulation enhances auditory phonetic convergence

Author: Dias, James W and Rosenblum, Lawrence D
Subjects: Cognitive and Computational Psychology, Psychology, Basic Behavioral and Social Science, Behavioral and Social Science, Clinical Research, Adult, Female, Humans, Male, Noise, Phonetics, Photic Stimulation, Speech, Speech Perception, Auditory noise, Audiovisual speech, Phonetic convergence, Phonological neighborhood density, Speech alignment, Speech articulation, Word frequency, Cognitive Sciences, Experimental Psychology, Biological psychology, Cognitive and computational psychology
Abstract: Talkers automatically imitate aspects of perceived speech, a phenomenon known as phonetic convergence. Talkers have previously been found to converge to auditory and visual speech information. Furthermore, talkers converge more to the speech of a conversational partner who is seen and heard, relative to one who is just heard (Dias & Rosenblum Perception, 40, 1457-1466, 2011). A question raised by this finding is what visual information facilitates the enhancement effect. In the following experiments, we investigated the possible contributions of visible speech articulation to visual enhancement of phonetic convergence within the noninteractive context of a shadowing task. In Experiment 1, we examined the influence of the visibility of a talker on phonetic convergence when shadowing auditory speech either in the clear or in low-level auditory noise. The results suggest that visual speech can compensate for convergence that is reduced by auditory noise masking. Experiment 2 further established the visibility of articulatory mouth movements as being important to the visual enhancement of phonetic convergence. Furthermore, the word frequency and phonological neighborhood density characteristics of the words shadowed were found to significantly predict phonetic convergence in both experiments. Consistent with previous findings (e.g., Goldinger Psychological Review, 105, 251-279, 1998), phonetic convergence was greater when shadowing low-frequency words. Convergence was also found to be greater for low-density words, contrasting with previous predictions of the effect of phonological neighborhood density on auditory phonetic convergence (e.g., Pardo, Jordan, Mallari, Scanlon, & Lewandowski Journal of Memory and Language, 69, 183-195, 2013). Implications of the results for a gestural account of phonetic convergence are discussed.
Published: 2016

35. Faces and Voices Processing in Human and Primate Brains: Rhythmic and Multimodal Mechanisms Underlying the Evolution and Development of Speech.

Author: Michon, Maëva, Zamorano-Abramson, José, and Aboitiz, Francisco
Subjects: SPEECH, HUMAN voice, VOCAL tract, PRIMATES, FACIAL expression
Abstract: While influential works since the 1970s have widely assumed that imitation is an innate skill in both human and non-human primate neonates, recent empirical studies and meta-analyses have challenged this view, indicating other forms of reward-based learning as relevant factors in the development of social behavior. The visual input translation into matching motor output that underlies imitation abilities instead seems to develop along with social interactions and sensorimotor experience during infancy and childhood. Recently, a new visual stream has been identified in both human and non-human primate brains, updating the dual visual stream model. This third pathway is thought to be specialized for dynamics aspects of social perceptions such as eye-gaze, facial expression and crucially for audio-visual integration of speech. Here, we review empirical studies addressing an understudied but crucial aspect of speech and communication, namely the processing of visual orofacial cues (i.e., the perception of a speaker's lips and tongue movements) and its integration with vocal auditory cues. Along this review, we offer new insights from our understanding of speech as the product of evolution and development of a rhythmic and multimodal organization of sensorimotor brain networks, supporting volitional motor control of the upper vocal tract and audio-visual voices-faces integration. [ABSTRACT FROM AUTHOR]
Published: 2022
Full Text: View/download PDF

36. Weaker McGurk Effect for Rubin's Vase-Type Speech in People With High Autistic Traits.

Author: Ujiie, Yuta and Takahashi, Kohske
Subjects: *SPEECH perception, *AUTISM spectrum disorders, *AUDITORY perception, *AUTISTIC people, *LIPREADING
Abstract: While visual information from facial speech modulates auditory speech perception, it is less influential on audiovisual speech perception among autistic individuals than among typically developed individuals. In this study, we investigated the relationship between autistic traits (Autism-Spectrum Quotient; AQ) and the influence of visual speech on the recognition of Rubin's vase-type speech stimuli with degraded facial speech information. Participants were 31 university students (13 males and 18 females; mean age: 19.2, SD: 1.13 years) who reported normal (or corrected-to-normal) hearing and vision. All participants completed three speech recognition tasks (visual, auditory, and audiovisual stimuli) and the AQ–Japanese version. The results showed that accuracies of speech recognition for visual (i.e., lip-reading) and auditory stimuli were not significantly related to participants' AQ. In contrast, audiovisual speech perception was less susceptible to facial speech perception among individuals with high rather than low autistic traits. The weaker influence of visual information on audiovisual speech perception in autism spectrum disorder (ASD) was robust regardless of the clarity of the visual information, suggesting a difficulty in the process of audiovisual integration rather than in the visual processing of facial speech. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

37. Phonetic matching of auditory and visual speech develops during childhood: Evidence from sine-wave speech

Author: Baart, Martijn, Bortfeld, Heather, and Vroomen, Jean
Subjects: Cognitive and Computational Psychology, Psychology, Pediatric, Neurosciences, Child, Child Development, Child, Preschool, Cues, Humans, Lipreading, Phonetics, Speech, Speech Perception, Audiovisual speech, Cross-modal correspondence, Phonetic cues, Temporal cues, Sine-wave speech, Development, Cognitive Sciences, Experimental Psychology, Applied and developmental psychology, Biological psychology, Social and personality psychology
Abstract: The correspondence between auditory speech and lip-read information can be detected based on a combination of temporal and phonetic cross-modal cues. Here, we determined the point in developmental time at which children start to effectively use phonetic information to match a speech sound with one of two articulating faces. We presented 4- to 11-year-olds (N=77) with three-syllabic sine-wave speech replicas of two pseudo-words that were perceived as non-speech and asked them to match the sounds with the corresponding lip-read video. At first, children had no phonetic knowledge about the sounds, and matching was thus based on the temporal cues that are fully retained in sine-wave speech. Next, we trained all children to perceive the phonetic identity of the sine-wave speech and repeated the audiovisual (AV) matching task. Only at around 6.5 years of age did the benefit of having phonetic knowledge about the stimuli become apparent, thereby indicating that AV matching based on phonetic cues presumably develops more slowly than AV matching based on temporal cues.
Published: 2015

38. The Impact of Neurocognitive Skills on Recognition of Spectrally Degraded Sentences.

Author: Lewis, Jessica H., Castellanos, Irina, and Moberly, Aaron C.
Subjects: *SPEECH perception, *HEARING, *COLLEGE students, *STATISTICS, *PHONOLOGICAL awareness, *ANALYSIS of variance, *TASK performance, *NEUROPSYCHOLOGICAL tests, *T-test (Statistics), *SHORT-term memory, *AUDIOMETRY, *RESEARCH funding, *REPEATED measures design, *DESCRIPTIVE statistics, *COGNITIVE testing, *STATISTICAL correlation, *DATA analysis, *ADULTS
Abstract: Background Recent models theorize that neurocognitive resources are deployed differently during speech recognition depending on task demands, such as the severity of degradation of the signal or modality (auditory vs. audiovisual [AV]). This concept is particularly relevant to the adult cochlear implant (CI) population, considering the large amount of variability among CI users in their spectro-temporal processing abilities. However, disentangling the effects of individual differences in spectro-temporal processing and neurocognitive skills on speech recognition in clinical populations of adult CI users is challenging. Thus, this study investigated the relationship between neurocognitive functions and recognition of spectrally degraded speech in a group of young adult normal-hearing (NH) listeners. Purpose The aim of this study was to manipulate the degree of spectral degradation and modality of speech presented to young adult NH listeners to determine whether deployment of neurocognitive skills would be affected. Research Design Correlational study design. Study Sample Twenty-one NH college students. Data Collection and Analysis Participants listened to sentences in three spectraldegradation conditions: no degradation (clear sentences); moderate degradation (8-channel noise-vocoded); and high degradation (4-channel noise-vocoded). Thirty sentences were presented in an auditory-only (A-only) modality and an AV fashion. Visual assessments from The National Institute of Health Toolbox Cognitive Battery were completed to evaluate working memory, inhibition-concentration, cognitive flexibility, and processing speed. Analyses of variance compared speech recognition performance among spectral degradation condition andmodality. Bivariate correlation analyses were performed among speech recognition performance and the neurocognitive skills in the various test conditions. Results Main effects on sentence recognition were found for degree of degradation (p = < 0.001) and modality (p = < 0.001). Inhibition-concentration skills moderately correlated (r = 0.45, p = 0.02) with recognition scores for sentences that were moderately degraded in the A-only condition. No correlations were found among neurocognitive scores and AV speech recognition scores. Conclusions Inhibition-concentration skills are deployed differentially during sentence recognition, depending on the level of signal degradation. Additional studies will be required to study these relations in actual clinical populations such as adult CI users. [ABSTRACT FROM AUTHOR]
Published: 2021
Full Text: View/download PDF

39. Breaking down the cocktail party: Attentional modulation of cerebral audiovisual speech processing

Author: Patrik Wikman, Elisa Sahari, Viljami Salmela, Alina Leminen, Miika Leminen, Matti Laine, and Kimmo Alho
Subjects: Selective attention, Audiovisual speech, Cocktail party, fMRI, Semantics, Multivariate pattern analysis (MVPA), Neurosciences. Biological psychiatry. Neuropsychiatry, RC321-571
Abstract: Recent studies utilizing electrophysiological speech envelope reconstruction have sparked renewed interest in the cocktail party effect by showing that auditory neurons entrain to selectively attended speech. Yet, the neural networks of attention to speech in naturalistic audiovisual settings with multiple sound sources remain poorly understood. We collected functional brain imaging data while participants viewed audiovisual video clips of lifelike dialogues with concurrent distracting speech in the background. Dialogues were presented in a full-factorial design, comprising task (listen to the dialogues vs. ignore them), audiovisual quality and semantic predictability. We used univariate analyses in combination with multivariate pattern analysis (MVPA) to study modulations of brain activity related to attentive processing of audiovisual speech. We found attentive speech processing to cause distinct spatiotemporal modulation profiles in distributed cortical areas including sensory and frontal-control networks. Semantic coherence modulated attention-related activation patterns in the earliest stages of auditory cortical processing, suggesting that the auditory cortex is involved in high-level speech processing. Our results corroborate views that emphasize the dynamic nature of attention, with task-specificity and context as cornerstones of the underlying neuro-cognitive mechanisms.
Published: 2021
Full Text: View/download PDF

40. Concatenated Frame Image Based CNN for Visual Speech Recognition

Author: Saitoh, Takeshi, Zhou, Ziheng, Zhao, Guoying, Pietikäinen, Matti, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Chen, Chu-Song, editor, Lu, Jiwen, editor, and Ma, Kai-Kuang, editor
Published: 2017
Full Text: View/download PDF

41. Crossmodal Phase Reset and Evoked Responses Provide Complementary Mechanisms for the Influence of Visual Speech in Auditory Cortex.

Author: Mégevand, Pierre, Mercier, Manuel R., Groppe, David M., Golumbic, Elana Zion, Mesgarani, Nima, Beauchamp, Michael S., Schroeder, Charles E., and Mehta, Ashesh D.
Subjects: *AUDITORY cortex, *AUDITORY evoked response, *AUDITORY neurons, *PHASE oscillations, *PHASE modulation
Abstract: Natural conversation is multisensory: when we can see the speaker's face, visual speech cues improve our comprehension. The neuronal mechanisms underlying this phenomenon remain unclear. The two main alternatives are visually mediated phase modulation of neuronal oscillations (excitability fluctuations) in auditory neurons and visual input-evoked responses in auditory neurons. Investigating this question using naturalistic audiovisual speech with intracranial recordings in humans of both sexes, we find evidence for both mechanisms. Remarkably, auditory cortical neurons track the temporal dynamics of purely visual speech using the phase of their slow oscillations and phase-related modulations in broadband high-frequency activity. Consistent with known perceptual enhancement effects, the visual phase reset amplifies the cortical representation of concomitant auditory speech. In contrast to this, and in line with earlier reports, visual input reduces the amplitude of evoked responses to concomitant auditory input. We interpret the combination of improved phase tracking and reduced response amplitude as evidence for more efficient and reliable stimulus processing in the presence of congruent auditory and visual speech inputs. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

42. Vision perceptually restores auditory spectral dynamics in speech.

Author: Plass, John, Brang, David, Suzuki, Satoru, and Grabowecky, Marcia
Subjects: *SPEECH perception, *AUDITORY perception, *SENSE organs, *SPEECH, *VISION
Abstract: Visual speech facilitates auditory speech perception, but the visual cues responsible for these benefits and the information they provide remain unclear. Low-level models emphasize basic temporal cues provided by mouth movements, but these impoverished signals may not fully account for the richness of auditory information provided by visual speech. High-level models posit interactions among abstract categorical (i.e., phonemes/visemes) or amodal (e.g., articulatory) speech representations, but require lossy remapping of speech signals onto abstracted representations. Because visible articulators shape the spectral content of speech, we hypothesized that the perceptual system might exploit natural correlations between midlevel visual (oral deformations) and auditory speech features (frequency modulations) to extract detailed spectrotemporal information from visual speech without employing high-level abstractions. Consistent with this hypothesis, we found that the time-frequency dynamics of oral resonances (formants) could be predicted with unexpectedly high precision from the changing shape of the mouth during speech. When isolated from other speech cues, speech-based shape deformations improved perceptual sensitivity for corresponding frequency modulations, suggesting that listeners could exploit this cross-modal correspondence to facilitate perception. To test whether this type of correspondence could improve speech comprehension, we selectively degraded the spectral or temporal dimensions of auditory sentence spectrograms to assess how well visual speech facilitated comprehension under each degradation condition. Visual speech produced drastically larger enhancements during spectral degradation, suggesting a conditionspecific facilitation effect driven by cross-modal recovery of auditory speech spectra. The perceptual system may therefore use audiovisual correlations rooted in oral acoustics to extract detailed spectrotemporal information from visual speech. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

43. The contribution of audiovisual speech to lexical-semantic processing in natural spoken sentences.

Author: Brunellière, Angèle, Delrue, Laurence, and Auran, Cyril
Subjects: *AUDIOVISUAL materials, *COMMUNICATION, *MEMORY, *SEMANTICS, *PHONOLOGICAL awareness
Abstract: In everyday communication, natural spoken sentences are expressed in a multisensory way through auditory signals and speakers' visible articulatory gestures. An important issue is to know whether audiovisual speech plays a main role in the linguistic encoding of an utterance until access to meaning. To this end, we conducted an event-related potential experiment during which participants listened passively to spoken sentences and a lexical recognition task. The results revealed that N200 and N400 waves had a greater amplitude after semantically incongruous words than after expected words. This effect of semantic congruency was increased over N200 in the audiovisual trials. Words presented audiovisually also elicited a reduced amplitude of the N400 wave and a facilitated recovery in memory. Our findings shed light on the influence of audiovisual speech on the understanding of natural spoken sentences by acting on the early stages of word recognition in order to access a lexical-semantic network. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

44. Attention modulates early auditory processing at a real cocktail party.

Author: Fitzroy, Ahren B., Ugolini, Margaret, Munoz, Miriam, Zobel, Benjamin H., Sherwood, Maxwell, and Sanders, Lisa D.
Subjects: *ATTENTION, *AUDITORY evoked response, *AUDITORY perception, *ELECTROENCEPHALOGRAPHY, *NOISE, *SOUND, *SPEECH perception, *INTELLIGIBILITY of speech, *SOCIAL context, *DESCRIPTIVE statistics
Abstract: Understanding speech in noisy environments is a substantial challenge. Decades of laboratory research have shown that selective attention allows listeners to preferentially process target speech. However, most of this work stripped away the multimodal complexity of the real world. We compared auditory evoked potentials elicited by acoustic onsets in attended and unattended live speech in a room with multiple live talkers. Acoustic onsets in attended speech elicited a larger negativity 100–220 ms, centred around the N1. Additionally, acoustic onsets in speech from a seen talker elicited a larger negativity during this time than onsets in audio-only speech. Further, the attention effect for an unseen talker was observed only when all unattended talkers were unseen as well. A seen talker is both easier to attend and harder to ignore. These results indicate that the more complex conditions encountered during real-life speech-in-noise processing modulate the attentional facilitation of speech perception. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

45. Sensitivity to temporal synchrony in audiovisual speech in early infancy: Current issues and future avenues.

Author: Lozano, Itziar, Campos, Ruth, and Belinchón, Mercedes
Subjects: *SPEECH, *SYNCHRONIC order, *INFANTS, *OPERATIONAL definitions, *HETEROGENEITY
Abstract: Audiovisual speech integration during infancy is crucial for socio-cognitive development. A key perceptual cue infants use to achieve this is temporal synchrony detection. Although the current developmental literature on this ability is rich, unsolved disagreements obscure the interpretation of findings. Here, we propose conceptual and methodological issues that may have contributed to a still unclear picture of the developmental trajectory of sensitivity to temporal synchrony, particularly when studied in audiovisual fluent speech. We discuss several sources of confusion, including a lack of terminological precision, heterogeneity in the experimental manipulations conducted, and in the paradigms and stimuli used. We propose an approach that clarifies the definition and operationalization of sensitivity to temporal synchrony and explores its developmental course, emphasizing the role of infants' linguistic experiences. Ultimately, we expect that our analytical review will contribute to the field by aligning theoretical constructs, proposing more fine-grained designs, and using stimuli closer to infants' experiences. • Infants rely on temporal synchrony detection to integrate audiovisual speech. • The trajectory of sensitivity to temporal synchrony in audiovisual speech is unclear. • We critically review conceptual and methodological issues in behavioral studies. • We argue for more terminological clarity and aligned experimental manipulations. • Theoretically focusing on language experiences and developmental processes is needed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. HAVRUS Corpus: High-Speed Recordings of Audio-Visual Russian Speech

Author: Verkhodanova, Vasilisa, Ronzhin, Alexander, Kipyatkova, Irina, Ivanko, Denis, Karpov, Alexey, Železný, Miloš, Hutchison, David, Series editor, Kanade, Takeo, Series editor, Kittler, Josef, Series editor, Kleinberg, Jon M., Series editor, Mattern, Friedemann, Series editor, Mitchell, John C., Series editor, Naor, Moni, Series editor, Pandu Rangan, C., Series editor, Steffen, Bernhard, Series editor, Terzopoulos, Demetri, Series editor, Tygar, Doug, Series editor, Weikum, Gerhard, Series editor, Ronzhin, Andrey, editor, Potapova, Rodmonga, editor, and Németh, Géza, editor
Published: 2016
Full Text: View/download PDF

47. Audiovisual speech perception: Moving beyond McGurk

Author: Van Engen Kj, Mitchell S. Sommers, Jonathan E. Peelle, and Avanti Dey
Subjects: Acoustics and Ultrasonics, Arts and Humanities (miscellaneous), Speech recognition, Perception, media_common.quotation_subject, Audiovisual speech, Special Issue on Reconsidering Classic Ideas in Speech Communication, Psychology, media_common
Abstract: Although it is clear that sighted listeners use both auditory and visual cues during speech perception, the manner in which multisensory information is combined is a matter of active debate. One approach to measuring multisensory integration is to use variants of the McGurk illusion, in which discrepant auditory and visual cues produce auditory percepts that differ from those based solely on unimodal input. Not all listeners show the same degree of susceptibility to the McGurk illusion, and these individual differences in susceptibility are frequently used as a measure of audiovisual integration ability. However, despite their popularity, we argue that McGurk tasks are ill-suited for studying the kind of multisensory speech perception that occurs in real life: McGurk stimuli are often based on isolated syllables (which are rare in conversations) and necessarily rely on audiovisual incongruence that does not occur naturally. Furthermore, recent data show that susceptibility on McGurk tasks does not correlate with performance during natural audiovisual speech perception. Although the McGurk effect is a fascinating illusion, truly understanding the combined use of auditory and visual information during speech perception requires tasks that more closely resemble everyday communication: namely, words, sentences, and stories with congruent auditory and visual speech cues.
Published: 2022
Full Text: View/download PDF

48. Electrocorticography reveals continuous auditory and visual speech tracking in temporal and occipital cortex.

Author: Micheli, Cristiano, Schepers, Inga M., Ozker, Müge, Yoshor, Daniel, Beauchamp, Michael S., and Rieger, Jochem W.
Subjects: *ELECTROENCEPHALOGRAPHY, *SPEECH perception, *PEOPLE with epilepsy, *SPEECH, *AUDITORY perception
Abstract: During natural speech perception, humans must parse temporally continuous auditory and visual speech signals into sequences of words. However, most studies of speech perception present only single words or syllables. We used electrocorticography (subdural electrodes implanted on the brains of epileptic patients) to investigate the neural mechanisms for processing continuous audiovisual speech signals consisting of individual sentences. Using partial correlation analysis, we found that posterior superior temporal gyrus (pSTG) and medial occipital cortex tracked both the auditory and the visual speech envelopes. These same regions, as well as inferior temporal cortex, responded more strongly to a dynamic video of a talking face compared to auditory speech paired with a static face. Occipital cortex and pSTG carry temporal information about both auditory and visual speech dynamics. Visual speech tracking in pSTG may be a mechanism for enhancing perception of degraded auditory speech. [ABSTRACT FROM AUTHOR]
Published: 2020
Full Text: View/download PDF

49. Audiovisual speech segmentation in post-stroke aphasia: a pilot study.

Author: Basirat, Anahita, Allart, Étienne, Brunellière, Angèle, and Martin, Yves
Subjects: AUDIOVISUAL materials, COMMUNICATION, PHONETICS, RESEARCH funding, PHYSIOLOGICAL aspects of speech, SPEECH perception, STATISTICS, STROKE, PILOT projects, DATA analysis, TASK performance, PHONOLOGICAL awareness, PROMPTS (Psychology), SEVERITY of illness index, REHABILITATION of aphasic persons, DATA analysis software, DESCRIPTIVE statistics, ACOUSTIC stimulation, MANN Whitney U Test, DISEASE complications
Abstract: Background: Stroke may cause sentence comprehension disorders. Speech segmentation, i.e. the ability to detect word boundaries while listening to continuous speech, is an initial step allowing the successful identification of words and the accurate understanding of meaning within sentences. It has received little attention in people with post-stroke aphasia (PWA). Objectives: Our goal was to study speech segmentation in PWA and examine the potential benefit of seeing the speakers' articulatory gestures while segmenting sentences. Methods: Fourteen PWA and twelve healthy controls participated in this pilot study. Performance was measured with a word-monitoring task. In the auditory-only modality, participants were presented with auditory-only stimuli while in the audiovisual modality, visual speech cues (i.e. speaker's articulatory gestures) accompanied the auditory input. The proportion of correct responses was calculated for each participant and each modality. Visual enhancement was then calculated in order to estimate the potential benefit of seeing the speaker's articulatory gestures. Results: Both in auditory-only and audiovisual modalities, PWA performed significantly less well than controls, who had 100% correct performance in both modalities. The performance of PWA was correlated with their phonological ability. Six PWA used the visual cues. Group level analysis performed on PWA did not show any reliable difference between the auditory-only and audiovisual modalities (median of visual enhancement = 7% [Q1 − Q3: −5 − 39]). Conclusion: Our findings show that speech segmentation disorder may exist in PWA. This points to the importance of assessing and training speech segmentation after stroke. Further studies should investigate the characteristics of PWA who use visual speech cues during sentence processing. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

50. Look at me when I'm talking to you: Selective attention at a multisensory cocktail party can be decoded using stimulus reconstruction and alpha power modulations.

Author: O'Sullivan, Aisling E., Lim, Chantelle Y., and Lalor, Edmund C.
Subjects: *COCKTAIL parties, *SELECTIVITY (Psychology), *SELF-talk, *ELECTROENCEPHALOGRAPHY, *SCALP
Abstract: Recent work using electroencephalography has applied stimulus reconstruction techniques to identify the attended speaker in a cocktail party environment. The success of these approaches has been primarily based on the ability to detect cortical tracking of the acoustic envelope at the scalp level. However, most studies have ignored the effects of visual input, which is almost always present in naturalistic scenarios. In this study, we investigated the effects of visual input on envelope‐based cocktail party decoding in two multisensory cocktail party situations: (a) Congruent AV—facing the attended speaker while ignoring another speaker represented by the audio‐only stream and (b) Incongruent AV (eavesdropping)—attending the audio‐only speaker while looking at the unattended speaker. We trained and tested decoders for each condition separately and found that we can successfully decode attention to congruent audiovisual speech and can also decode attention when listeners were eavesdropping, i.e., looking at the face of the unattended talker. In addition to this, we found alpha power to be a reliable measure of attention to the visual speech. Using parieto‐occipital alpha power, we found that we can distinguish whether subjects are attending or ignoring the speaker's face. Considering the practical applications of these methods, we demonstrate that with only six near‐ear electrodes we can successfully determine the attended speech. This work extends the current framework for decoding attention to speech to more naturalistic scenarios, and in doing so provides additional neural measures which may be incorporated to improve decoding accuracy. [ABSTRACT FROM AUTHOR]
Published: 2019
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

659 results on '"Audiovisual speech"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources