12 results on '"Kathy M. Carbonell"'
Search Results
2. Language Specificity in Phonetic Cue Weighting: Monolingual and Bilingual Perception of the Stop Voicing Contrast in English and Spanish
- Author
-
Andrew J. Lotto, Jessamyn Schertz, and Kathy M. Carbonell
- Subjects
Adult ,Linguistics and Language ,medicine.medical_specialty ,Acoustics and Ultrasonics ,media_common.quotation_subject ,Multilingualism ,Audiology ,050105 experimental psychology ,Language and Linguistics ,Young Adult ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Closure duration ,Phonetics ,Vowel ,Perception ,medicine ,Humans ,Speech ,0501 psychology and cognitive sciences ,Language ,media_common ,05 social sciences ,Voice-onset time ,Contrast (statistics) ,Weighting ,Logistic Models ,Formant ,Acoustic Stimulation ,Voice ,Cues ,0305 other medical science ,Psychology - Abstract
Background/Aims:This work examines the perception of the stop voicing contrast in Spanish and English along four acoustic dimensions, comparing monolingual and bilingual listeners. Our primary goals are to test the extent to which cue-weighting strategies are language-specific in monolinguals, and whether this language specificity extends to bilingual listeners.Methods:Participants categorized sounds varying in voice onset time (VOT, the primary cue to the contrast) and three secondary cues: fundamental frequency at vowel onset, first formant (F1) onset frequency, and stop closure duration. Listeners heard acoustically identical target stimuli, within language-specific carrier phrases, in English and Spanish modes.Results:While all listener groups used all cues, monolingual English listeners relied more on F1, and less on closure duration, than monolingual Spanish listeners, indicating language specificity in cue use. Early bilingual listeners used the three secondary cues similarly in English and Spanish, despite showing language-specific VOT boundaries.Conclusion:While our findings reinforce previous work demonstrating language-specific phonetic representations in bilinguals in terms of VOT boundary, they suggest that this specificity may not extend straightforwardly to cue-weighting strategies.
- Published
- 2019
- Full Text
- View/download PDF
3. Reliability of individual differences in degraded speech perception
- Author
-
Kathy M. Carbonell
- Subjects
Adult ,Male ,medicine.medical_specialty ,Speech perception ,Adolescent ,Acoustics and Ultrasonics ,Voice Quality ,Acoustics ,Audiology ,Intelligibility (communication) ,behavioral disciplines and activities ,Speech Acoustics ,Speech shadowing ,Young Adult ,03 medical and health sciences ,0302 clinical medicine ,Arts and Humanities (miscellaneous) ,otorhinolaryngologic diseases ,medicine ,Humans ,Active listening ,030223 otorhinolaryngology ,medicine.diagnostic_test ,Speech Intelligibility ,Recognition, Psychology ,Speech processing ,Acoustic Stimulation ,Word recognition ,Speech Perception ,Female ,Audiometry ,Audiometry, Speech ,Noise ,Psychology ,Perceptual Masking ,psychological phenomena and processes ,030217 neurology & neurosurgery - Abstract
Listeners' speech perception abilities vary extensively in challenging listening conditions. There is little evidence as to whether this variability is a result of true, stable individual differences or just variability arising from measurement error. This study examines listeners' word recognition abilities across multiple sessions and a variety of degraded speech tasks (noise-vocoded, time-compressed, and speech in babble noise). Participants transcribed isolated single syllable words presented in all three degradation types and repeated these tasks (with different words) on a separate day. Correlations of transcription accuracy demonstrate that individual differences in performance are reliable across sessions. In addition, performance on all three degradation types was correlated. These results suggest that differences in performance on degraded speech perception tasks for normal hearing listeners are robust and that there are underlying factors that promote the ability to understand degraded speech regardless of the specific manner of degradation. Uncovering these general performance factors may provide insight into the salient performance variance observed in listeners with hearing impairment.
- Published
- 2017
- Full Text
- View/download PDF
4. Discriminating Simulated Vocal Tremor Source Using Amplitude Modulation Spectra
- Author
-
Rosemary A. Lester, Kathy M. Carbonell, Andrew J. Lotto, and Brad H. Story
- Subjects
Speech Acoustics ,Voice Disorders ,Computer science ,Voice Quality ,Speech recognition ,Acoustics ,Vocal tremor ,Fundamental frequency ,Vocal Cords ,LPN and LVN ,Article ,nervous system diseases ,Amplitude modulation ,Speech and Hearing ,Amplitude ,Otorhinolaryngology ,Discriminant function analysis ,Speech Production Measurement ,otorhinolaryngologic diseases ,Humans ,Envelope (motion) - Abstract
Summary Objectives/Hypothesis Sources of vocal tremor are difficult to categorize perceptually and acoustically. This article describes a preliminary attempt to discriminate vocal tremor sources through the use of spectral measures of the amplitude envelope. The hypothesis is that different vocal tremor sources are associated with distinct patterns of acoustic amplitude modulations. Study Design Statistical categorization methods (discriminant function analysis) were used to discriminate signals from simulated vocal tremor with different sources using only acoustic measures derived from the amplitude envelopes. Methods Simulations of vocal tremor were created by modulating parameters of a vocal fold model corresponding to oscillations of respiratory driving pressure (respiratory tremor), degree of vocal fold adduction (adductory tremor), and fundamental frequency of vocal fold vibration (F0 tremor). The acoustic measures were based on spectral analyses of the amplitude envelope computed across the entire signal and within select frequency bands. Results The signals could be categorized (with accuracy well above chance) in terms of the simulated tremor source using only measures of the amplitude envelope spectrum even when multiple sources of tremor were included. Conclusions These results supply initial support for an amplitude-envelope-based approach to identify the source of vocal tremor and provide further evidence for the rich information about talker characteristics present in the temporal structure of the amplitude envelope.
- Published
- 2014
5. Speech is not special… again
- Author
-
Andrew J. Lotto and Kathy M. Carbonell
- Subjects
Speech Acoustics ,Motor theory of speech perception ,Categorical perception ,Speech perception ,media_common.quotation_subject ,Speech recognition ,multisensory integration ,lcsh:BF1-990 ,Opinion Article ,speech perception ,lcsh:Psychology ,Motor Theory ,Perception ,Psychology ,McGurk effect ,Psychoacoustics ,sensorimotor effects on perception ,Coarticulation ,General Psychology ,auditory processing ,media_common - Abstract
THE “SPECIALNESS” OF SPEECH As is apparent from reading the first line of nearly any research or review article on speech, the task of perceiving speech sounds is complex and the ease with which humans acquire, produce and perceive these sounds is remarkable. Despite the growing appreciation for the complexity of the perception of music, speech perception remains the most amazing and poorly understood auditory (and, if we may be so bold, perceptual) accomplishments of humans. Over the years, there has been considerable debate on whether this achievement is the result of general perceptual/cognitive mechanisms or “special” processes dedicated to the mapping of speech acoustics to linguistic representations (for reviews see Trout, 2001; Diehl et al., 2004). The most familiar proposal of the “specialness” of speech perception is the various incarnations of the Motor Theory of speech proposed by Liberman et al. (1967; Liberman and Mattingly, 1985, 1989). Given the status of research into audition in the 1950s and 1960s, it is not surprising that speech appeared to require processing not available in “normal” hearing. Much of the work at the time used relatively simple tones and noises to get at the basic psychoacoustics underlying the perception of pitch and loudness (though some researchers like Harvey Fletcher were also working on some basics of speech perception, Fletcher and Galt, 1950; Allen, 1996). Liberman and his collaborators discovered that the discrimination of acoustic changes in speech sounds did not look like the psychoacoustic measures of discrimination for pitch and loudness. Instead of following a Weber or Fechner law, the discrimination function had a peak near the categorization boundary between contrasting phonemes—a pattern of perceptual results that is referred to as Categorical Perception (Liberman et al., 1957). In addition, the acoustic cues to phonemic identity were not readily apparent with similar spectral patterns resulting in different phonemic percepts and acoustically disparate patterns resulting in identical phonemic percepts—the problem of “lack of invariance” (e.g., Liberman et al., 1952). The perception of these varying acoustic patterns was highly context-sensitive to preceding and following phonetic content in ways that appeared specific to the communicative constraints of speech and not applicable to the perception of other sounds—as in demonstrations of perceptual compensation for coarticulation, speaking rate normalization and talker normalization (e.g., Ladefoged and Broadbent, 1957; Miller and Liberman, 1979; Mann, 1980). One major source of evidence in favor of a Motor Theory account of speech perception is that information about a speaker’s production (anatomy or kinematics) from non-auditory sources can affect phonetic perception. The famed McGurk effect (McGurk and MacDonald, 1976), in which visual presentation of a talker can alter the auditory phonetic percept, is taken as evidence that listeners are integrating information about production from this secondary source. Fowler and Deckle (1991) have demonstrated a similar effect using haptic information gathered by touching the speaker’s face (see also Sato et al., 2010). Gick and Derrick (2009) reported that perception of consonant— vowel tokens in noise are biased toward voiceless stops (e.g., /pa/) when they are accompanied by a small burst of air on the skin of the listener, which could be interpreted as the aspiration that would more likely accompany the release of a voiceless stop. In addition, there have been several studies that have demonstrated that manipulations of the listener’s articulators can affect perception, which are supportive of the Motor Theory proposal that the mechanisms of production underlie the perception of speech. For example, Ito et al. (2009) obtained shifts in phoneme categorization resulting from external manipulation of the skin around the listener’s mouth in ways that would correspond to the deformations typical of producing these speech sounds (see also Yeung and Werker, 2013 for a similar demonstration with infants). Recently, Mochida et al. (2013) found that the ability to categorize consonants can be influenced by the simultaneous silent production of these consonants. Typically, these studies are proffered as evidence for a direct role of speech motor processing in speech perception. Independent of this proposed motor basis of perception, others have suggested the existence of a special speech or phonetic mode of perception based on evidence of neural and behavioral responses to the same stimuli being modulated by whether or not the listener believes the signal to be speech or non-speech (e.g., Tomiak et al., 1987; Vroomen and Baart, 2009; Stekelenburg and Vroomen, 2012).
- Published
- 2014
- Full Text
- View/download PDF
6. Absence or presence of preceding sound can change perceived phonetic identity
- Author
-
Andrew J. Lotto and Kathy M. Carbonell
- Subjects
Noise ,geography ,geography.geographical_feature_category ,Categorization ,Broadband noise ,Speech recognition ,Context (language use) ,Psychology ,Coarticulation ,Identity (music) ,Sound (geography) - Abstract
Participants were asked to categorize a series of syllables varying from /ga/ to /da/ presented in isolation or following /al/, /ar/, /a/, or filtered noise bands. Typical shifts in categorization were obtained for /al/ vs. /ar/ contexts as predicted by compensation for coarticulation, but the shift in response between isolated presentation and any of the context conditions was much larger, even when the context was broadband noise. These results suggest that the effect of the presence of any context sound is greater than the effect of the content of the context sounds.
- Published
- 2012
- Full Text
- View/download PDF
7. Discriminating vocal tremor source from amplitude envelope modulations
- Author
-
Kathy M. Carbonell, Brad H. Story, Rosemary A. Lester, and Andrew J. Lotto
- Subjects
medicine.medical_specialty ,Acoustics and Ultrasonics ,Computer science ,Acoustics ,Vocal tremor ,Audiology ,Vocal fold adduction ,nervous system diseases ,medicine.anatomical_structure ,Amplitude ,Arts and Humanities (miscellaneous) ,Vocal folds ,otorhinolaryngologic diseases ,medicine ,Vocal tract ,Envelope (motion) - Abstract
Vocal tremor can have a variety of physiological sources. For example, tremors can result from involuntary oscillation of respiratory muscles (respiratory tremor), or of the muscles responsible for vocal fold adduction (adductory tremor) or lengthening (f0 tremor). While the sources of vocal tremor are distinct, they are notoriously difficult to categorize both perceptually and acoustically. In order to develop acoustic measures that can potentially distinguish sources of tremor, speech samples were synthesized using a kinematic model of the vocal folds attached to a model of the vocal tract and trachea [Titze, JASA, 75, 570-580; Story, 2005, JASA, 117, 3231-3254]. Tremors were created by modulating parameters of the vocal fold model corresponding to the three types mentioned above. The acoustic measures were related to temporal regularities in the amplitude envelope computed across the entire signal and select frequency bands. These measures could reliably categorize the samples by tremor source (as dete...
- Published
- 2012
- Full Text
- View/download PDF
8. Discriminating languages with general measures of temporal regularity and spectral variance
- Author
-
Andrew J. Lotto, Kathy M. Carbonell, and Daniel Brenner
- Subjects
Acoustics and Ultrasonics ,Speech recognition ,Spectral density ,Variance (accounting) ,Mandarin Chinese ,language.human_language ,Rhythm ,Arts and Humanities (miscellaneous) ,Duration (music) ,Octave ,language ,Set (psychology) ,Mathematics ,Envelope (motion) - Abstract
There has been a lot of recent interest in distinguishing languages based on their rhythmic differences. A common successful approach involves measures of relative durations and duration variability of vowels and consonants in utterances. Recent studies have shown that more general measures of temporal regularities in the amplitude envelope in separate frequency bands (the Envelope Modulation Spectrum) can reliably discriminate between English and Spanish [Carbonell et al. J. Acoust. Soc. Am. 129, 2680.]. In the current study, these temporal structure measures were supplemented with measures of the mean and variance of spectral energy in octave bands as well as with traditional linguistic measures. Using stepwise discriminant analysis and a set of productions from Japanese, Korean and Mandarin speakers, this suite of both acoustic and linguistic measures were tested together and pitted against each other to determine the most efficient discriminators of language. The results provide insight into what the ...
- Published
- 2012
- Full Text
- View/download PDF
9. Discriminating language and talker using non-linguistic measures of rhythm, spectral energy and f0
- Author
-
Kathy M. Carbonell, Andrew J. Lotto, Julie M. Liss, Kaitlin L. Lansford, and Rene L. Utianski
- Subjects
Rhythm ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Stepwise discriminant analysis ,Octave ,Spectral density ,Variance (accounting) ,Fundamental frequency ,Set (psychology) ,Measure (mathematics) ,Linguistics ,Mathematics - Abstract
Recent studies have shown that rhythm metrics calculated from amplitude envelopes extracted from octave bands across the spectrum (the envelope modulation spectrum or EMS) can reliably discriminate between spoken Spanish and English even when produced by the same speakers [Carbonell et al., J. Acoust. Soc. Am. 129, 2680]. Additionally, bilingual speakers could be discriminated fairly seven females and five males well on EMS variables even across sentences spoken in the different languages. In the current study, EMS, a general acoustic measure with no reference to phoneme/linguistic entities, was supplemented with measures of the mean and variance of spectral energy in each octave band as well as the mean and variance of fundamental frequency. Using stepwise discriminant analysis and the set of bilingual productions of Spanish and English, it was determined that language discrimination was excellent using both EMS and spectral measures, whereas spectral and f0 measures were most informative for speaker discrimination. The results demonstrate that this suite of easily-calculated acoustic measures provides abundant information about differences between languages as well as inherent differences in speakers that are independent of language spoken. [Work supported by NIH-NIDCD.]
- Published
- 2011
- Full Text
- View/download PDF
10. Stable production rhythms across languages for bilingual speakers
- Author
-
Andrew J. Lotto, Kaitlin L. Lansford, Julie M. Liss, Kathy M. Carbonell, Rene L. Utianski, and Sarah C. Sullivan
- Subjects
Rhythm ,Acoustics and Ultrasonics ,Arts and Humanities (miscellaneous) ,Computer science ,Factor (programming language) ,Speech recognition ,Production (computer science) ,computer ,Linguistics ,Sentence ,computer.programming_language - Abstract
There has been a great deal of work on classifying spoken languages according to their perceived or acoustically‐measured rhythmic structures. The current study examined the speech of 12 Spanish‐English bilinguals producing sentences in both languages using rhythmic measures based on the amplitude envelopes extracted from different frequency regions—the envelope modulation spectrum (EMS). Using discriminant factor analysis, EMS variables demonstrated a moderate ability to classify the language being spoken suggesting that rhythmic differences between languages survive even when speaker is controlled. More interesting is the fact that EMS variables could reliably classify which speaker produced each sentence even across languages. This result suggests that there are stable rhythmic structures in an individual talker’s speech that are apparent above and beyond the structural constraints of the language spoken. The EMS appears capable of describing systematic characteristics of both the talker and the language spoken. [Work supported by NIH‐NIDCD].
- Published
- 2011
- Full Text
- View/download PDF
11. Presence of preceding sound affects the neural representation of speech sounds: Frequency following response data
- Author
-
Kathy M. Carbonell, Andrew J. Lotto, and Radhika Aravamudhan
- Subjects
Neural correlates of consciousness ,Speech perception ,Acoustics and Ultrasonics ,medicine.diagnostic_test ,Context effect ,Acoustics ,Speech recognition ,Stimulus (physiology) ,Frequency following response ,Electroencephalography ,Formant ,Arts and Humanities (miscellaneous) ,medicine ,Spectrogram ,Mathematics - Abstract
A substantial body of literature has focused on context effects in speech perception in which manipulation of the phonemic or spectral content of preceding sounds (e.g., /al/ versus /ar/) result in a shift in the perceptual categorization of a target syllable (e.g., /da/ versus /ga/). In a previous study utilizing the frequency‐following response (FFR) to measure neural correlates of these context effects [R. Aravamudhan, J. Acoust. Soc. Am. 126, 2204], it was noted that the representation of target formant trajectories were much weaker when the stimulus was presented in isolation versus following some type of context. To examine this effect explicitly, a series of syllables varying from /da/ to /ga/ was presented to listeners either in isolation or following the syllables /a/, /al/, or /ar/ (with a 50‐ms silent gap between context and target). FFR measures were obtained from EEG recordings while participants listened passively. The resulting narrow‐band spectrograms over the grand averages demonstrated t...
- Published
- 2010
- Full Text
- View/download PDF
12. Presence of preceding sound affects the neural representation of speech sounds: Behavioral data
- Author
-
Andrew J. Lotto, Radhika Aravamudhan, and Kathy M. Carbonell
- Subjects
geography ,Speech perception ,geography.geographical_feature_category ,Acoustics and Ultrasonics ,Acoustics ,Speech recognition ,Speech sounds ,Context (language use) ,Representation (arts) ,Behavioral data ,Formant ,Arts and Humanities (miscellaneous) ,Categorization ,Psychology ,Sound (geography) - Abstract
Traditionally, context‐sensitive speech perception has been demonstrated by eliciting shifts in target sound categorization through manipulation of the phonemic/spectral content of surrounding context. For example, changing the third formant frequency of a preceding context (from /al/ to /ar/) can result in significant shifts in target categorization (from /ga/ to /da/). However, it is probable that the most salient difference in context is between the presence or absence of any other sound. The question becomes whether this large change in context has substantial effects on target categorization as well. In the current study, participants were asked to categorize members of a series of syllables varying from /ga/ to /da/ presented in isolation or following /al/, /ar/, or /a/. The typical shifts in categorization were obtained for /al/ vsersus /ar/ contexts, but the shift in response between isolated presentation and any of the audible context conditions was much larger (with more /da/ responses in isolation). A similar pattern of results was obtained when the audible context was broadband noise. Additionally, slopes of categorization functions were less steep for isolated syllables than when preceded by speech. These results suggest that isolated syllables may be neurally encoded differently than the same syllables in context.
- Published
- 2010
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.