518 results on '"Audiovisual speech"'
Search Results
502. Updating expectencies about audiovisual associations in speech.
- Author
-
Paris, Tim, Kim, Jeesun, and Davis, Christopher
- Subjects
- *
SPEECH , *ELECTROENCEPHALOGRAPHY , *LIPREADING - Abstract
The processing of multisensory information depends on the learned association between sensory cues. In the case of speech there is a well-learned association between the movements of the lips and the subsequent sound. That is, particular lip and mouth movements reliably lead to a specific sound. EEG and MEG studies that have investigated the differences between this 'congruent' AV association and other 'incongruent' associations have commonly reported ERP differences from 350 ms after sound onset. Using a 256 active electrode EEG system, we tested whether this 'congruency effect' would be reduced in the context where most of the trials had an altered audiovisual association (auditory speech paired with mismatched visual lip movements). Participants were presented stimuli over 2 sessions: in one session only 15% were incongruent trials; in the other session, 85% were incongruent trials. We found a congruency effect, showing differences in ERP between congruent and incongruent speech between 350 and 500 ms. Importantly, this effect was reduced within the context of mostly incongruent trials. This reduction in the congruency effect indicates that the way in which AV speech is processed depends on the context it is viewed in. Furthermore, this result suggests that exposure to novel sensory relationships leads to updated expectations regarding the relationship between auditory and visual speech cues. [ABSTRACT FROM AUTHOR]
- Published
- 2012
- Full Text
- View/download PDF
503. Assessing automaticity in audiovisual speech integration: evidence from the speeded classification task
- Author
-
Soto-Faraco, Salvador, Navarra, Jordi, and Alsius, Agnès
- Subjects
- *
SPEECH , *SYLLABLE (Grammar) , *LIPS , *LANGUAGE & languages - Abstract
The McGurk effect is usually presented as an example of fast, automatic, multisensory integration. We report a series of experiments designed to directly assess these claims. We used a syllabic version of the speeded classification paradigm, whereby response latencies to the first (target) syllable of spoken word-like stimuli are slowed down when the second (irrelevant) syllable varies from trial to trial. This interference effect is interpreted as a failure of selective attention to filter out the irrelevant syllable. In Experiment 1 we reproduced the syllabic interference effect with bimodal stimuli containing auditory as well as visual lip movement information, thus confirming the generalizability of the phenomenon. In subsequent experiments we were able to produce (Experiment 2) and to eliminate (Experiment 3) syllabic interference by introducing ‘illusory’ (McGurk) audiovisual stimuli in the irrelevant syllable, suggesting that audiovisual integration occurs prior to attentional selection in this paradigm. [Copyright &y& Elsevier]
- Published
- 2004
- Full Text
- View/download PDF
504. The effect of face and lip inversion on audiovisual speech integration
- Author
-
Lawrence D. Rosenblum, Kerry P. Green, Chantel L. Bosley, Deborah A. Yakel, and Rebecca A. Vasquez
- Subjects
Auditory perception ,Visual perception ,Acoustics and Ultrasonics ,Acoustics ,Speech recognition ,media_common.quotation_subject ,Oral cavity ,Facial recognition system ,Arts and Humanities (miscellaneous) ,Face perception ,Perception ,Audiovisual speech ,Psychology ,media_common - Abstract
Seeing a speaking face can influence observers’ auditory perception of syllables [McGurk and McDonald, Nature 264, 746–748 (1976)]. This effect decreases when the speaker’s face is inverted [e.g., Green, 3014 (1994)]. Face recognition is also inhibited with inverted faces [e.g., Rock, Sci. Am. 230, 78–85 (1974)] suggesting a similar underlying process. To further explore the link between face and audiovisual speech perception, a speech experiment was designed to replicate another face perception effect. In this effect, an inverted face and an inverted face containing upright lips are perceived as looking normal, but an upright face with inverted lips looks grotesque [Thompson, Perception 9, 438–484 (1980)]. An audiovisual speech experiment tested four presentation conditions: Upright face‐upright mouth, upright face‐inverted mouth, inverted face‐inverted mouth, inverted face‐upright mouth. Various discrepant audiovisual syllables were tested in each condition. Visual influences occurred in all but the upr...
- Published
- 1995
- Full Text
- View/download PDF
505. Auditory localization and audiovisual speech perception
- Author
-
K. G. Munhall and J. A. Jones
- Subjects
medicine.medical_specialty ,Visual perception ,Acoustics and Ultrasonics ,media_common.quotation_subject ,Acoustics ,Audiology ,Stimulus (physiology) ,Intervocalic consonant ,Arts and Humanities (miscellaneous) ,Perception ,medicine ,Auditory localization ,Sound sources ,McGurk effect ,Audiovisual speech ,Psychology ,media_common - Abstract
The influence of the spatial position of the acoustic signal in audiovisual speech perception was investigated in a series of experiments using the McGurk effect. Subjects viewed video disk recordings of faces producing visual VCV nonsense syllables while simultaneous acoustic VCV stimuli were presented from one of 7 different speakers locations. The speakers were positioned in a semicircular array in front of the subject. In separate studies subjects were required to name the intervocalic consonant, indicate the location of the sound source using a computerized pentiometer system, or perform both tasks. Preliminary results suggest that the subjects’ ability to localize the position of the auditory stimulus was influenced by the presence of a visual stimulus. Specifically, subjects tended to localize the sound closer to the position of the monitor showing the visual stimulus. However the strength of the McGurk effect was not influenced by the spatial position of the sound source. Subjects perceived the visual /g/ and the auditory /b/ combination as /d/ equally often at all sound locations. The independence of the ventriloquist effect and audiovisual integration in the perception of the McGurk effect will be discussed in terms of spatial constraints on the cross‐modal perception of speech.
- Published
- 1995
- Full Text
- View/download PDF
506. A developmental study of audiovisual speech perception using the McGurk paradigm
- Author
-
Neil S. Hockley and Linda Polka
- Subjects
Motor theory of speech perception ,medicine.medical_specialty ,Speech perception ,Acoustics and Ultrasonics ,media_common.quotation_subject ,Audiology ,Speech patterns ,Arts and Humanities (miscellaneous) ,Age groups ,Perception ,medicine ,Auditory information ,Audiovisual speech ,Syllable ,Psychology ,media_common - Abstract
The development of audiovisual speech perception was examined in this experiment using the McGurk paradigm (McGurk and MacDonald, 1976), in which a visual recording of a person saying a particular syllable is synchronized with the auditory presentation of another syllable. Previous audiovisual speech studies have shown that adult perception is strongly influenced by the visual speech information whereas the perception of young children (5–8 years) shows a very weak influence of visual speech patterns and a strong bias favoring the auditory speech information. In this investigation 46 children in four age groups (5, 7, 9, and 11 year olds) and 15 adults were presented with conflicting audiovisual syllables in which an auditory /ba/ sequence was combined with visual /va/, /θa/, /da/, and /ga/ sequences, respectively. The results indicated that the influence of auditory information decreased with increasing age, while the influence of visual information and the integration of auditory and visual information ...
- Published
- 1994
- Full Text
- View/download PDF
507. Bilingualism affects audiovisual phoneme identification.
- Author
-
Burfin S, Pascalis O, Ruiz Tada E, Costa A, Savariaux C, and Kandel S
- Abstract
We all go through a process of perceptual narrowing for phoneme identification. As we become experts in the languages we hear in our environment we lose the ability to identify phonemes that do not exist in our native phonological inventory. This research examined how linguistic experience-i.e., the exposure to a double phonological code during childhood-affects the visual processes involved in non-native phoneme identification in audiovisual speech perception. We conducted a phoneme identification experiment with bilingual and monolingual adult participants. It was an ABX task involving a Bengali dental-retroflex contrast that does not exist in any of the participants' languages. The phonemes were presented in audiovisual (AV) and audio-only (A) conditions. The results revealed that in the audio-only condition monolinguals and bilinguals had difficulties in discriminating the retroflex non-native phoneme. They were phonologically "deaf" and assimilated it to the dental phoneme that exists in their native languages. In the audiovisual presentation instead, both groups could overcome the phonological deafness for the retroflex non-native phoneme and identify both Bengali phonemes. However, monolinguals were more accurate and responded quicker than bilinguals. This suggests that bilinguals do not use the same processes as monolinguals to decode visual speech.
- Published
- 2014
- Full Text
- View/download PDF
508. Visual speech segmentation: using facial cues to locate word boundaries in continuous speech.
- Author
-
Mitchel AD and Weiss DJ
- Abstract
Speech is typically a multimodal phenomenon, yet few studies have focused on the exclusive contributions of visual cues to language acquisition. To address this gap, we investigated whether visual prosodic information can facilitate speech segmentation. Previous research has demonstrated that language learners can use lexical stress and pitch cues to segment speech and that learners can extract this information from talking faces. Thus, we created an artificial speech stream that contained minimal segmentation cues and paired it with two synchronous facial displays in which visual prosody was either informative or uninformative for identifying word boundaries. Across three familiarisation conditions (audio stream alone, facial streams alone, and paired audiovisual), learning occurred only when the facial displays were informative to word boundaries, suggesting that facial cues can help learners solve the early challenges of language acquisition.
- Published
- 2014
- Full Text
- View/download PDF
509. Visual Contribution to Speech Intelligibility in Noise
- Author
-
Irwin Pollack and W. H. Sumby
- Subjects
Speechreading ,Vocabulary ,Acoustics and Ultrasonics ,media_common.quotation_subject ,Speech recognition ,Acoustics ,Visible Speech ,Intelligibility (communication) ,Arts and Humanities (miscellaneous) ,McGurk effect ,Visual observation ,Neurocomputational speech processing ,Audiovisual speech ,media_common ,Mathematics - Abstract
Oral speech intelligibility tests were conducted with, and without, supplementary visual observation of the speaker's facial and lip movements. The difference between these two conditions was examined as a function of the speech‐to‐noise ratio and of the size of the vocabulary under test. The visual contribution to oral speech intelligibility (relative to its possible contribution) is, to a first approximation, independent of the speech‐to‐noise ratio under test. However, since there is a much greater opportunity for the visual contribution at low speech‐to‐noise ratios, its absolute contribution can be exploited most profitably under these conditions.
- Published
- 1954
- Full Text
- View/download PDF
510. Audiovisual temporal integration in reverberant environments
- Author
-
Carsten Griwodz, Dawn M. Behne, and Ragnhild Eg
- Subjects
Reverberation ,Linguistics and Language ,Simultaneity ,Microphone ,Event (computing) ,Computer science ,media_common.quotation_subject ,Speech recognition ,Communication ,SIGNAL (programming language) ,Teleconference ,Language and Linguistics ,Asynchrony (computer programming) ,Computer Science Applications ,Audiovisual speech ,Audiovisual asynchrony ,Modeling and Simulation ,Perception ,Modelling and Simulation ,Computer Vision and Pattern Recognition ,Temporal integration ,Software ,media_common - Abstract
Explores perceived audiovisual synchrony in reverberant environments.The temporal smear caused by reverberation can alter a signal's acoustic signature.Reverberating acoustics is a common problem in teleconferencing systems.Reverberation may not have adverse effects on the perceived synchrony for continuous audiovisual speech.The temporal integration of speech syllables and isolated events is affected by the acoustic phenomenon. With teleconferencing becoming more accessible as a communication platform, researchers are working to understand the consequences of the interaction between human perception and this unfamiliar environment. Given the enclosed space of a teleconference room, along with the physical separation between the user, microphone and speakers, the transmitted audio often becomes mixed with the reverberating auditory components from the room. As a result, the audio can be perceived as smeared in time, and this can affect the user experience and perceived quality. Moreover, other challenges remain to be solved. For instance, during encoding, compression and transmission, the audio and video streams are typically treated separately. Consequently, the signals are rarely perfectly aligned and synchronous. In effect, timing affects both reverberation and audiovisual synchrony, and the two challenges may well be inter-dependent. This study explores the temporal integration of audiovisual continuous speech and speech syllables, along with a non-speech event, across a range of asynchrony levels for different reverberation conditions. Non-reverberant stimuli are compared to stimuli with added reverberation recordings. Findings reveal that reverberation does not affect the temporal integration of continuous speech. However, reverberation influences the temporal integration of the isolated speech syllables and the action-oriented event, with perceived subjective synchrony skewed towards audio lead asynchrony and away from the more common audio lag direction. Furthermore, less time is spent on simultaneity judgements for the longer sequences when the temporal offsets get longer and when reverberation is introduced, suggesting that both asynchrony and reverberation add to the demands of the task.
- Full Text
- View/download PDF
511. A rationale for the utilization of audiovisual speech models in teaching speech criticism
- Author
-
Robert K. Avery
- Subjects
Speech technology ,Criticism ,Audiovisual speech ,Psychology ,Linguistics - Published
- 1972
- Full Text
- View/download PDF
512. Assessing audiovisual speech-reception disability
- Author
-
John R. Foster and Quentin Summerfield
- Subjects
medicine.medical_specialty ,education.field_of_study ,Population ,Adult population ,Face (sociological concept) ,Audiology ,Test (assessment) ,Variation (linguistics) ,Vowel ,Subject (grammar) ,otorhinolaryngologic diseases ,medicine ,Audiovisual speech ,education ,Psychology ,Cognitive psychology - Abstract
Publisher Summary This chapter discusses about assessing audio-visual speech reception disability. The importance of the role of vision in understanding speech is attested informally by the observation of many older hearing-impaired listeners that they can hear better when they wear their glasses, and by the strategy of most listeners in noisy environments of looking at the face of the talker whose speech they wish to understand. Vision complements audition by providing, in particular, those phonetic details that are lost easily in noise or impairment. The Four-alternative Auditory Disability and Speech-reading test (FADAST) was developed in response to two major needs. First, it would be used as a member of a battery of tests in a population-based survey of deafness. In this role it would contribute to establishing the profile of speech-reception disability in the adult population of Great Britain and relating it to other measures of disability, impairment, and handicap. Secondly, it would be used to assess the benefit obtained from different aids by more impaired adults where the aid is most appropriately construed as an aid to lip-reading. The requirements of the test were that it should: (1) possess sufficient face-validity to ensure the co-operation and motivation of the subject; (2) be straightforward to administer and receive; (3) be relatively free from practice effects; (4) permit a rigorous, informative scoring procedure; (5) be free from phonological dialect variation, particularly between the radically different vowel systems of English as spoken in Scotland and England; and (6) be sensitive to a wide range of disability.
- Published
- 1983
- Full Text
- View/download PDF
513. [Untitled]
- Subjects
Melody ,Age of Acquisition ,Sine wave ,Speech recognition ,Piano ,Psychophysics ,Audiovisual speech ,Musical ,Stimulus (physiology) ,Psychology ,General Psychology - Abstract
This psychophysics study used musicians as a model to investigate whether musical expertise shapes the temporal integration window for audiovisual speech, sinewave speech, or music. Musicians and non-musicians judged the audiovisual synchrony of speech, sinewave analogs of speech, and music stimuli at 13 audiovisual stimulus onset asynchronies (±360, ±300 ±240, ±180, ±120, ±60, and 0 ms). Further, we manipulated the duration of the stimuli by presenting sentences/melodies or syllables/tones. Critically, musicians relative to non-musicians exhibited significantly narrower temporal integration windows for both music and sinewave speech. Further, the temporal integration window for music decreased with the amount of music practice, but not with age of acquisition. In other words, the more musicians practiced piano in the past 3 years, the more sensitive they became to the temporal misalignment of visual and auditory signals. Collectively, our findings demonstrate that music practicing fine-tunes the audiovisual temporal integration window to various extents depending on the stimulus class. While the effect of piano practicing was most pronounced for music, it also generalized to other stimulus classes such as sinewave speech and to a marginally significant degree to natural speech.
514. Children With SLI Can Exhibit Reduced Attention to a Talker's Mouth
- Author
-
Pons Gimeno, Ferran, Sanz Torrent, Mònica, Ferinu Sanz, Laura, Birulés Muntané, Joan, Andreu Barrachina, Llorenç, Universitat de Barcelona, and Universitat Oberta de Catalunya (UOC)
- Subjects
nens ,Trastorns del llenguatge ,Eyes-mouth ,ojos-boca ,trastorno específico del lenguaje (TEL) ,behavioral disciplines and activities ,ulls-boca ,Audiovisual speech ,Specific language impairment (SLI) ,Language disorders ,Trastornos del lenguaje ,trastorn específic del llenguatge (TEL) ,discurso audiovisual ,discurs audiovisual ,Children ,niños - Abstract
It has been demonstrated that children with specific language impairment (SLI) show difficulties not only with auditory but also with audiovisual speech perception. The goal of this study was to assess whether children with SLI might show reduced attention to the talker's mouth compared to their typically developing (TD) peers. An additional aim was to determine whether the pattern of attention to a talking face would be related to a specific subtype of SLI. We used an eye-tracker methodology and presented a video of a talker speaking the children's native language. Results revealed that children with SLI paid significantly less attention to the mouth than the TD children. More specifically, it was also observed that children with a phonological-syntactic deficit looked less to the mouth as compared to the children with a lexical-syntactic deficit.
515. Audiovisual integration as conflict resolution: The conflict of the McGurk illusion
- Author
-
Morís Fernández, Luis, 1982, Macaluso, Emiliano, and Soto-Faraco, Salvador, 1970
- Subjects
Adult ,Male ,Brain Mapping ,Gyrus cinguli ,Lipreading ,Speech perception ,Brain ,Models, Psychological ,Neuropsychological Tests ,Illusions ,Magnetic Resonance Imaging ,Prefrontal cortex ,Oxygen ,Audiovisual speech ,Magnetic resonance imaging ,Illusion ,Cerebrovascular Circulation ,Speech Perception ,Multisensory integration ,Humans ,Female ,McGurk illusion ,Facial Recognition ,Research Articles - Abstract
There are two main behavioral expressions of multisensory integration (MSI) in speech; the perceptual enhancement produced by the sight of the congruent lip movements of the speaker, and the illusory sound perceived when a speech syllable is dubbed with incongruent lip movements, in the McGurk effect. These two models have been used very often to study MSI. Here, we contend that, unlike congruent audiovisually (AV) speech, the McGurk effect involves brain areas related to conflict detection and resolution. To test this hypothesis, we used fMRI to measure blood oxygen level dependent responses to AV speech syllables. We analyzed brain activity as a function of the nature of the stimuli—McGurk or non‐McGurk—and the perceptual outcome regarding MSI—integrated or not integrated response—in a 2 × 2 factorial design. The results showed that, regardless of perceptual outcome, AV mismatch activated general‐purpose conflict areas (e.g., anterior cingulate cortex) as well as specific AV speech conflict areas (e.g., inferior frontal gyrus), compared with AV matching stimuli. Moreover, these conflict areas showed stronger activation on trials where the McGurk illusion was perceived compared with non‐illusory trials, despite the stimuli where physically identical. We conclude that the AV incongruence in McGurk stimuli triggers the activation of conflict processing areas and that the process of resolving the cross‐modal conflict is critical for the McGurk illusion to arise. Hum Brain Mapp 38:5691–5705, 2017. This research was supported by the Ministerio de Economía y Competitividad (PSI2016–75558‐PAEI/FEDER), AGAUR Generalitat de Catalunya (2014SGR856 and 2012BE100392), and the European Research Council (StG‐2010 263145). The Neuroimaging Laboratory of the Fondazione Santa Lucia is supported by the Italian Ministry of Health. EM is supported by the program “Investissements d'Avenir” (ANR‐11‐IDEX‐0007) and by the program “BQR Accueil EC 16” of University Claude Bernard Lyon 1.
516. Applying the summation model in audiovisual speech perception
- Author
-
Tarja Peromaa, Kaisa Tiippana, Ilmari Kurki, Medicum, Perception Action Cognition, and Department of Psychology and Logopedics
- Subjects
515 Psychology ,Speech recognition ,Perception ,media_common.quotation_subject ,education ,Audiovisual speech ,Psychology ,media_common
517. Visual attention modulates audiovisual speech perception
- Author
-
Kaisa Tiippana, Mikko Sams, and Tobias S. Andersen
- Subjects
Motor theory of speech perception ,Speech perception ,Crossmodal ,media_common.quotation_subject ,05 social sciences ,Experimental and Cognitive Psychology ,050105 experimental psychology ,03 medical and health sciences ,0302 clinical medicine ,Perception ,0501 psychology and cognitive sciences ,McGurk effect ,Audiovisual speech ,Neurocomputational speech processing ,Percept ,Psychology ,030217 neurology & neurosurgery ,Cognitive psychology ,media_common - Abstract
Speech perception is audiovisual, as demonstrated by the McGurk effect in which discrepant visual speech alters the auditory speech percept. We studied the role of visual attention in audiovisual speech perception by measuring the McGurk effect in two conditions. In the baseline condition, attention was focused on the talking face. In the distracted attention condition, subjects ignored the face and attended to a visual distractor, which was a leaf moving across the face. The McGurk effect was weaker in the latter condition, indicating that visual attention modulated audiovisual speech perception. This modulation may occur at an early, unisensory processing stage, or it may be due to changes at the stage where auditory and visual information is integrated. We investigated this issue by conventional statistical testing, and by fitting the Fuzzy Logical Model of Perception (Massaro, 1998) to the results. The two methods suggested different interpretations, revealing a paradox in the current methods of analysis.
518. Consequences of audiovisual asynchrony for speech perception: Implications for signal processing in aids to lipreading
- Author
-
Quentin Summerfield and Matthew McGrath
- Subjects
medicine.medical_specialty ,Signal processing ,Speech perception ,Acoustics and Ultrasonics ,Acoustics ,Audiology ,Asynchrony (computer programming) ,medicine.anatomical_structure ,Arts and Humanities (miscellaneous) ,Vocal folds ,Delay ,medicine ,Audiovisual speech ,Syllabic verse ,Psychology ,Sensitivity (electronics) - Abstract
Audiovisual identification of sentences was measured as a function of audio delay in untrained listeners with normal hearing; the sound track was replaced by rectangular pulses originally synchronized to the closing of the talker's vocal folds and then subjected to delay. Although group‐mean performance declined monotonically with delay, systematic decrements occurred only when delay exceeded 80 ms. A similar tolerance of delay was found in judgments of audiovisual onset time when observers determined whether a 120‐Hz triangular wave started before or after the opening of a pair of liplike Lissajou figures. Group‐mean 70% DLs were − 78 ms (sound leading) and + 137 ms (sound lagging). This result suggests, first, that most observers possess insufficient sensitivity to intermodal timing cues in audiovisual speech for them to be used analogously to VOT in auditory speech perception, and, second, that the effects found in the first experiment derive from syllabic rather than phonemic interference. However, th...
- Published
- 1984
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.