1,083 results on '"VOWELS"'
Search Results
2. Preceding Vowel Duration as a Cue to the Perception of the Voicing Characteristic of Word-Final Consonants in American English
- Author
-
Raphael, Lawrence J.
- Published
- 1972
3. Identification of Stops and Vowels for the Burst Portion of /p,t,k/ Isolated from Conversational Speech
- Author
-
Winitz, Harris
- Abstract
Research supported in part by grants from the National Institute of Child Health and Human Development. (RS)
- Published
- 1972
4. Interaction between Two Factors that Influence Vowel Duration
- Author
-
Klatt, Dennis H.
- Abstract
Research supported by the National Institutes of Health and the Office of Naval Research. (DD)
- Published
- 1973
5. Some Articulatory Manifestations of Vowel Stress
- Author
-
Mermelstein, Paul
- Published
- 1973
6. Dynamic acoustic vowel distances within and across dialects.
- Author
-
Clopper, Cynthia G.
- Subjects
- *
AMERICAN English language , *VOWELS , *FREQUENCY spectra , *DIALECTS , *LANGUAGE & languages - Abstract
Vowels vary in their acoustic similarity across regional dialects of American English, such that some vowels are more similar to one another in some dialects than others. Acoustic vowel distance measures typically evaluate vowel similarity at a discrete time point, resulting in distance estimates that may not fully capture vowel similarity in formant trajectory dynamics. In the current study, language and accent distance measures, which evaluate acoustic distances between talkers over time, were applied to the evaluation of vowel category similarity within talkers. These vowel category distances were then compared across dialects, and their utility in capturing predicted patterns of regional dialect variation in American English was examined. Dynamic time warping of mel-frequency cepstral coefficients was used to assess acoustic distance across the frequency spectrum and captured predicted Southern American English vowel similarity. Root-mean-square distance and generalized additive mixed models were used to assess acoustic distance for selected formant trajectories and captured predicted Southern, New England, and Northern American English vowel similarity. Generalized additive mixed models captured the most predicted variation, but, unlike the other measures, do not return a single acoustic distance value. All three measures are potentially useful for understanding variation in vowel category similarity across dialects. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Acoustic characteristics of infant- and foreigner-directed speech with Mandarin as the target language.
- Author
-
Zhang, Yu, Chen, Fei, Xu, Feng, Guo, Chengyu, and Li, Kexuan
- Subjects
- *
SPEECH , *TONE (Phonetics) , *LANGUAGE acquisition , *DEAF children , *LANGUAGE & languages , *VOWELS , *PREMATURE infants - Abstract
The quality of speech input influences the efficiency of L1 and L2 acquisition. This study examined modifications in infant-directed speech (IDS) and foreigner-directed speech (FDS) in Standard Mandarin—a tonal language—and explored how IDS and FDS features were manifested in disyllabic words and a longer discourse. The study aimed to determine which characteristics of IDS and FDS were enhanced in comparison with adult-directed speech (ADS), and how IDS and FDS differed when measured in a common set of acoustic parameters. For words, it was found that tone-bearing vowel duration, mean and range of fundamental frequency (F0), and the lexical tone contours were enhanced in IDS and FDS relative to ADS, except for the dipping Tone 3 that exhibited an unexpected lowering in FDS, but no modification in IDS when compared with ADS. For the discourse, different aspects of temporal and F0 enhancements were emphasized in IDS and FDS: the mean F0 was higher in IDS whereas the total discourse duration was greater in FDS. These findings add to the growing literature on L1 and L2 speech input characteristics and their role in language acquisition. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Acoustic, phonetic, and phonological features of Drehu vowels.
- Author
-
Torres, Catalina, Li, Weicong, and Escudero, Paola
- Subjects
- *
VOWELS , *OTOACOUSTIC emissions - Abstract
This study presents an acoustic investigation of the vowel inventory of Drehu (Southern Oceanic Linkage), spoken in New Caledonia. Reportedly, Drehu has a 14 vowel system distinguishing seven vowel qualities and an additional length distinction. Previous phonological descriptions were based on impressionistic accounts showing divergent proposals for two out of seven reported vowel qualities. This study presents the first phonetic investigation of Drehu vowels based on acoustic data from eight speakers. To examine the phonetic correlates of the proposed phonological vowel inventory, multi-point acoustic analyses were used, and vowel inherent spectral change (VISC) was investigated (F1, F2, and F3). Additionally, vowel duration was measured. Contrary to reports from other studies on VISC in monophthongs, we find that monophthongs in Drehu are mostly steady state. We propose a revised vowel inventory and focus on the acoustic description of open-mid /ɛ/ and the central vowel /ə/, whose status was previously unclear. Additionally, we find that vowel quality stands orthogonal to vowel quantity by demonstrating that the phonological vowel length distinction is primarily based on a duration cue rather than formant structure. Finally, we report the acoustic properties of the seven vowel qualities that were identified. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Articulatory and acoustic dynamics of fronted back vowels in American English.
- Author
-
Havenhill, Jonathan
- Subjects
- *
AMERICAN English language , *VOWELS , *VOCAL tract , *ACOUSTIC reflex - Abstract
Fronting of the vowels /u, ʊ, o/ is observed throughout most North American English varieties, but has been analyzed mainly in terms of acoustics rather than articulation. Because an increase in F2, the acoustic correlate of vowel fronting, can be the result of any gesture that shortens the front cavity of the vocal tract, acoustic data alone do not reveal the combination of tongue fronting and/or lip unrounding that speakers use to produce fronted vowels. It is furthermore unresolved to what extent the articulation of fronted back vowels varies according to consonantal context and how the tongue and lips contribute to the F2 trajectory throughout the vowel. This paper presents articulatory and acoustic data on fronted back vowels from two varieties of American English: coastal Southern California and South Carolina. Through analysis of dynamic acoustic, ultrasound, and lip video data, it is shown that speakers of both varieties produce fronted /u, ʊ, o/ with rounded lips, and that high F2 observed for these vowels is associated with a front-central tongue position rather than unrounded lips. Examination of time-varying formant trajectories and articulatory configurations shows that the degree of vowel-internal F2 change is predominantly determined by coarticulatory influence of the coda. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Fusion of dichotic consonants in normal-hearing and hearing-impaired listenersa).
- Author
-
Sathe, Nishad C., Kain, Alexander, and Reiss, Lina A. J.
- Subjects
- *
CONSONANTS , *SPEECH , *VOWELS , *SPEECH perception , *INTELLIGIBILITY of speech - Abstract
Hearing-impaired (HI) listeners have been shown to exhibit increased fusion of dichotic vowels, even with different fundamental frequency (F0), leading to binaural spectral averaging and interference. To determine if similar fusion and averaging occurs for consonants, four natural and synthesized stop consonants (/pa/, /ba/, /ka/, /ga/) at three F0s of 74, 106, and 185 Hz were presented dichotically—with ΔF0 varied—to normal-hearing (NH) and HI listeners. Listeners identified the one or two consonants perceived, and response options included /ta/ and /da/ as fused percepts. As ΔF0 increased, both groups showed decreases in fusion and increases in percent correct identification of both consonants, with HI listeners displaying similar fusion but poorer identification. Both groups exhibited spectral averaging (psychoacoustic fusion) of place of articulation but phonetic feature fusion for differences in voicing. With synthetic consonants, NH subjects showed increased fusion and decreased identification. Most HI listeners were unable to discriminate the synthetic consonants. The findings suggest smaller differences between groups in consonant fusion than vowel fusion, possibly due to the presence of more cues for segregation in natural speech or reduced reliance on spectral cues for consonant perception. The inability of HI listeners to discriminate synthetic consonants suggests a reliance on cues other than formant transitions for consonant discrimination. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Formant dynamics in second language speech: Japanese speakers' production of English liquids.
- Author
-
Nagamine, Takayuki
- Subjects
- *
ENGLISH language , *JAPANESE language , *LIQUIDS , *VOWELS - Abstract
This article reports an acoustic study analysing the time-varying spectral properties of word-initial English liquids produced by 31 first-language (L1) Japanese and 14 L1 English speakers. While it is widely accepted that L1 Japanese speakers have difficulty in producing English /l/ and /ɹ/, the temporal characteristics of L2 English liquids are not well-understood, even in light of previous findings that English liquids show dynamic properties. In this study, the distance between the first and second formants (F2–F1) and the third formant (F3) are analysed dynamically over liquid-vowel intervals in three vowel contexts using generalised additive mixed models (GAMMs). The results demonstrate that L1 Japanese speakers produce word-initial English liquids with stronger vocalic coarticulation than L1 English speakers. L1 Japanese speakers may have difficulty in dissociating F2–F1 between the liquid and the vowel to a varying degree, depending on the vowel context, which could be related to perceptual factors. This article shows that dynamic information uncovers specific challenges that L1 Japanese speakers have in producing L2 English liquids accurately. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Documenting and modeling the acoustic variability of intervocalic alveolar taps in conversational Peninsular Spanish.
- Author
-
Perry, Scott James, Kelley, Matthew C., and Tucker, Benjamin V.
- Subjects
- *
ACOUSTIC models , *SPANISH language , *SPEECH , *AUTOMATIC speech recognition , *RESEARCH personnel , *VOWELS - Abstract
This study constitutes an investigation into the acoustic variability of intervocalic alveolar taps in a corpus of spontaneous speech from Madrid, Spain. Substantial variability was documented in this segment, with highly reduced variants constituting roughly half of all tokens during spectrographic inspection. In addition to qualitative documentation, the intensity difference between the tap and surrounding vowels was measured. Changes in this intensity difference were statistically modeled using Bayesian finite mixture models containing lexical and phonetic predictors. Model comparisons indicate predictive performance is improved when we assume two latent categories, interpreted as two pronunciation variants for the Spanish tap. In interpreting the model, predictors were more often related to categorical changes in which pronunciation variant was produced than to gradient intensity changes within each tap type. Variability in tap production was found according to lexical frequency, speech rate, and phonetic environment. These results underscore the importance of evaluating model fit to the data as well as what researchers modeling phonetic variability can gain in moving past linear models when they do not adequately fit the observed data. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. Contribution of acoustic cues to prominence ratings for four Mandarin vowels.
- Author
-
Zhang, Wei and Clayards, Meghan
- Subjects
- *
VOWELS , *REGRESSION analysis , *OPEN-ended questions , *ACOUSTIC vibrations - Abstract
The acoustic cues for prosodic prominence have been explored extensively, but one open question is to what extent they differ by context. This study investigates the extent to which vowel type affects how acoustic cues are related to prominence ratings provided in a corpus of spoken Mandarin. In the corpus, each syllable was rated as either prominent or non-prominent. We predicted prominence ratings using Bayesian mixed-effect regression models for each of four Mandarin vowels (/a, i, ɤ, u/), using fundamental frequency (F0), intensity, duration, the first and second formants, and tone type as predictors. We compared the role of each cue within and across the four models. We found that overall duration was the best predictor of prominence ratings and that formants were the weakest, but the role of each cue differed by vowel. We did not find credible evidence that F0 was relevant for /a/, or that intensity was relevant for /i/. We also found evidence that duration was more important for /ɤ/ than for /i/. The results suggest that vowel type credibly affects prominence ratings, which may reflect differences in the coordination of acoustic cues in prominence marking. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
14. Refining and extending measures for fricative spectra, with special attention to the high-frequency rangea).
- Author
-
Shadle, Christine H., Chen, Wei-Rong, Koenig, Laura L., and Preston, Jonathan L.
- Subjects
- *
VOCAL tract , *SPEECH , *VOWELS , *CONSONANTS - Abstract
Fricatives have noise sources that are filtered by the vocal tract and that typically possess energy over a much broader range of frequencies than observed for vowels and sonorant consonants. This paper introduces and refines fricative measurements that were designed to reflect underlying articulatory and aerodynamic conditions These show differences in the pattern of high-frequency energy for sibilants vs non-sibilants, voiced vs voiceless fricatives, and non-sibilants differing in place of articulation. The results confirm the utility of a spectral peak measure (FM) and low–mid frequency amplitude difference (AmpD) for sibilants. Using a higher-frequency range for defining FM for female voices for alveolars is justified; a still higher range was considered and rejected. High-frequency maximum amplitude (Fh) and amplitude difference between low- and higher-frequency regions (AmpRange) capture /f-θ/ differences in English and the dynamic amplitude range over the entire spectrum. For this dataset, with spectral information up to 15 kHz, a new measure, HighLevelD, was more effective than previously used LevelD and Slope in showing changes over time within the frication. Finally, isolated words and connected speech differ. This work contributes improved measures of fricative spectra and demonstrates the necessity of including high-frequency energy in those measures. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
15. Voice onset time and vowel formant measures in online testing and laboratory-based testing with(out) surgical face masks.
- Author
-
Stoehr, Antje, Souganidis, Christoforos, Thomas, Trisha B., Jacobsen, Jessi, and Martin, Clara D.
- Subjects
- *
MEDICAL masks , *VOWELS , *SPEECH , *ENGLISH language , *SPANISH language , *TESTING laboratories - Abstract
Since the COVID-19 pandemic started, conducting experiments online is increasingly common, and face masks are often used in everyday life. It remains unclear whether phonetic detail in speech production is captured adequately when speech is recorded in internet-based experiments or in experiments conducted with face masks. We tested 55 Spanish–Basque–English trilinguals in picture naming tasks in three conditions: online, laboratory-based with surgical face masks, and laboratory-based without face masks (control). We measured plosive voice onset time (VOT) in each language, the formants and duration of English vowels /iː/ and /ɪ/, and the Spanish/Basque vowel space. Across conditions, there were differences between English and Spanish/Basque VOT and in formants and duration between English /iː/–/ɪ/; between conditions, small differences emerged. Relative to the control condition, the Spanish/Basque vowel space was larger in online testing and smaller in the face mask condition. We conclude that testing online or with face masks is suitable for investigating phonetic detail in within-participant designs although the precise measurements may differ from those in traditional laboratory-based research. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
16. An acoustic study of Cantonese alaryngeal speech in different speaking conditions.
- Author
-
Cox, Steven R., Huang, Ting, Chen, Wei-Rong, and Ng, Manwa L.
- Subjects
- *
SPEECH , *VOWELS , *STATISTICAL models , *LARYNX , *ELOCUTION - Abstract
Esophageal (ES) speech, tracheoesophageal (TE) speech, and the electrolarynx (EL) are common methods of communication following the removal of the larynx. Our recent study demonstrated that intelligibility may increase for Cantonese alaryngeal speakers using clear speech (CS) compared to their everyday "habitual speech" (HS), but the reasoning is still unclear [Hui, Cox, Huang, Chen, and Ng (2022). Folia Phoniatr. Logop. 74, 103–111]. The purpose of this study was to assess the acoustic characteristics of vowels and tones produced by Cantonese alaryngeal speakers using HS and CS. Thirty-one alaryngeal speakers (9 EL, 10 ES, and 12 TE speakers) read The North Wind and the Sun passage in HS and CS. Vowel formants, vowel space area (VSA), speaking rate, pitch, and intensity were examined, and their relationship to intelligibility were evaluated. Statistical models suggest that larger VSAs significantly improved intelligibility, but slower speaking rate did not. Vowel and tonal contrasts did not differ between HS and CS for all three groups, but the amount of information encoded in fundamental frequency and intensity differences between high and low tones positively correlated with intelligibility for TE and ES groups, respectively. Continued research is needed to understand the effects of different speaking conditions toward improving acoustic and perceptual characteristics of Cantonese alaryngeal speech. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
17. Intra- and inter-speaker variation in eight Russian fricativesa).
- Author
-
Ulrich, Natalja, Pellegrino, François, and Allassonnière-Tang, Marc
- Subjects
- *
FREQUENCY spectra , *GENDER identity , *MACHINE learning , *VOWELS - Abstract
Acoustic variation is central to the study of speaker characterization. In this respect, specific phonemic classes such as vowels have been particularly studied, compared to fricatives. Fricatives exhibit important aperiodic energy, which can extend over a high-frequency range beyond that conventionally considered in phonetic analyses, often limited up to 12 kHz. We adopt here an extended frequency range up to 20.05 kHz to study a corpus of 15 812 fricatives produced by 59 speakers in Russian, a language offering a rich inventory of fricatives. We extracted two sets of parameters: the first is composed of 11 parameters derived from the frequency spectrum and duration (acoustic set) while the second is composed of 13 mel frequency cepstral coefficients (MFCCs). As a first step, we implemented machine learning methods to evaluate the potential of each set to predict gender and speaker identity. We show that gender can be predicted with a good performance by the acoustic set and even more so by MFCCs (accuracy of 0.72 and 0.88, respectively). MFCCs also predict individuals to some extent (accuracy = 0.64) unlike the acoustic set. In a second step, we provide a detailed analysis of the observed intra- and inter-speaker acoustic variation. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
18. Rapid movements at segment boundariesa).
- Author
-
Svensson Lundmark, Malin
- Subjects
- *
VOWELS , *CONSONANTS , *POSTURE , *TONGUE , *LIPS - Abstract
This paper reports on a one-to-one aspect of the articulatory-acoustic relationship, explaining how acoustic segment boundaries are a result of the rapid movements of the active articulators. In the acceleration profile, these are identified as acceleration peaks, which can be measured. To test the relationship, consonant and vowel segment durations are compared to articulatory posture intervals based on acceleration peaks, and time lags are measured on the alignment of the segment boundaries to the acceleration peaks. Strong relationships and short time lags are expected when the acceleration peaks belong to crucial articulators, whereas weak relationships are expected when the acceleration peaks belong to non-crucial articulators. The results show that lip posture intervals are indeed strongly correlated with [m], and tongue tip postures are strongly correlated with [n]. This is confirmed by the time lag results, which also reveal that the acoustic boundaries precede the acceleration peaks. Exceptions to the predictions are attributed to the speech material or the joint jaw-lip control unit. Moreover, the vowel segments are strongly correlated with the consonantal articulators while less correlated with the tongue body, suggesting that acceleration of crucial consonantal articulators determines not only consonant segment duration but also vowel segment duration. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
19. Deep learning assessment of syllable affiliation of intervocalic consonants.
- Author
-
Liu, Zirui and Xu, Yi
- Subjects
- *
CONSONANTS , *DEEP learning , *JUDGMENT (Psychology) , *ENGLISH language , *VOWELS - Abstract
In English, a sentence like "He made out our intentions." could be misperceived as "He may doubt our intentions." because the coda /d/ sounds like it has become the onset of the next syllable. The nature and occurrence condition of this resyllabification phenomenon are unclear, however. Previous empirical studies mainly relied on listener judgment, limited acoustic evidence, such as voice onset time, or average formant values to determine the occurrence of resyllabification. This study tested the hypothesis that resyllabification is a coarticulatory reorganisation that realigns the coda consonant with the vowel of the next syllable. Deep learning in conjunction with dynamic time warping (DTW) was used to assess syllable affiliation of intervocalic consonants. The results suggest that convolutional neural network- and recurrent neural network-based models can detect cases of resyllabification using Mel-frequency spectrograms. DTW analysis shows that neural network inferred resyllabified sequences are acoustically more similar to their onset counterparts than their canonical productions. A binary classifier further suggests that, similar to the genuine onsets, the inferred resyllabified coda consonants are coarticulated with the following vowel. These results are interpreted with an account of resyllabification as a speech-rate-dependent coarticulatory reorganisation mechanism in speech. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
20. The perception of nasal coarticulatory variation in face-masked speech.
- Author
-
Zellou, Georgia, Pycha, Anne, and Cohn, Michelle
- Subjects
- *
SPEECH , *INTELLIGIBILITY of speech , *AMERICAN English language , *VOWELS , *BRITISH Americans - Abstract
This study investigates the impact of wearing a face mask on the production and perception of coarticulatory vowel nasalization. Speakers produced monosyllabic American English words with oral and nasal codas (i.e., CVC and CVN) in face-masked and un-face-masked conditions to a real human interlocutor. The vowel was either tense or lax. Acoustic analyses indicate that speakers produced greater coarticulatory vowel nasality in CVN items when wearing a face mask, particularly, when the vowel is lax, suggesting targeted enhancement of the oral-nasalized contrast in this condition. This enhancement is not observed for tense vowels. In a perception study, participants heard CV syllables excised from the recorded words and performed coda identifications. For lax vowels, listeners were more accurate at identifying the coda in the face-masked condition, indicating that they benefited from the speakers' production adjustments. Overall, the results indicate that speakers adapt their speech in specific contexts when wearing a face mask, and these speaker adjustments have an influence on listeners' abilities to identify words in the speech signal. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
21. Sample size matters in calculating Pillai scoresa).
- Author
-
Stanley, Joseph A. and Sneller, Betsy
- Subjects
- *
SAMPLE size (Statistics) , *VOWELS , *TEST scoring , *SOCIOLINGUISTICS - Abstract
Since their introduction to sociolinguistics by Hay, Warren, and Drager [(2006). J. Phon. (Modell. Sociophon. Var.) 34(4), 458–484], Pillai scores have become a standard metric for quantifying vowel overlap. However, there is no established threshold value for determining whether two vowels are merged, leading to conflicting ad hoc measures. Furthermore, as a parametric measure, Pillai scores are sensitive to sample size. In this paper, we use generated data from a simulated pair of underlyingly merged vowels to demonstrate (1) larger sample sizes yield reliably more accurate Pillai scores, (2) unequal group sizes across the two vowel classes are irrelevant in the calculation of Pillai scores, and (3) it takes many more data than many sociolinguistic studies typically analyze to return a reliably low Pillai score for underlyingly merged data. We provide some recommendations for maximizing reliability in the use of Pillai scores and provide a formula to assist researchers in determining a reasonable threshold to use as an indicator of merged status given their sample size. We demonstrate these recommendations in action with a case study. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
22. Effect of telepractice on pediatric cochlear implant users and provider vowel space: A preliminary report.
- Author
-
Kondaurova, Maria V., Zheng, Qi, Donaldson, Cheryl W., and Smith, Alan F.
- Subjects
- *
COCHLEAR implants , *VOWELS , *PROSODIC analysis (Linguistics) , *SPEECH , *ORAL communication , *GOAL (Psychology) - Abstract
Clear speaking styles are goal-oriented modifications in which talkers adapt acoustic-phonetic characteristics of speech to compensate for communication challenges. Do children with hearing loss and a clinical provider modify speech characteristics during telepractice to adjust for remote communication? The study examined the effect of telepractice (tele-) on vowel production in seven (mean age 4:11 years, SD 1:2 years) children with cochlear implants (CIs) and a provider. The first (F1) and second (F2) formant frequencies of /i/, /ɑ/, and /u/ vowels were measured in child and provider speech during one in-person and one tele-speech-language intervention, order counterbalanced. Child and provider vowel space areas (VSA) were calculated. The results demonstrated an increase in F2 formant frequency for /i/ vowel in child and provider speech and an increase in F1 formant frequency for /ɑ/ vowel in the provider speech during tele- compared to in-person intervention. An expansion of VSA was found in child and provider speech in tele- compared to in-person intervention. In children, the earlier age of CI activation was associated with larger VSA in both tele- and in-person intervention. The results suggest that the children and the provider adjust vowel articulation in response to remote communication during telepractice. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
23. Low dimensional measurement of vowels using machine perception.
- Author
-
Burridge, James and Vaux, Bert
- Subjects
- *
DISTRIBUTION (Probability theory) , *CONVOLUTIONAL neural networks , *VOWELS , *FEATURE extraction , *SOUND measurement - Abstract
A method is presented for combining the feature extraction power of neural networks with model based dimensionality reduction to produce linguistically motivated low dimensional measurements of sounds. This method works by first training a convolutional neural network (CNN) to predict linguistically relevant category labels from the spectrograms of sounds. Then, idealized models of these categories are defined as probability distributions in a low dimensional measurement space with locations chosen to reproduce, as far as possible, the perceptual characteristics of the CNN. To measure a sound, the point is found in the measurement space for which the posterior probability distribution over categories in the idealized model most closely matches the category probabilities output by the CNN for that sound. In this way, the feature learning power of the CNN is used to produce low dimensional measurements. This method is demonstrated using monophthongal vowel categories to train this CNN and produce measurements in two dimensions. It is also shown that the perceptual characteristics of this CNN are similar to those of human listeners. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
24. Investigating phoneme-dependencies of spherical voice directivity patterns II: Various groups of phonemes.
- Author
-
Pörschmann, Christoph and Arend, Johannes M.
- Subjects
- *
PHONEME (Linguistics) , *ACOUSTIC radiation , *HUMAN voice , *MICROPHONE arrays , *MICROPHONES , *VOWELS - Abstract
The substantial variation between articulated phonemes is a fundamental feature of human voice production. However, while the spectral and temporal aspects of the phonemes have been extensively studied, few have investigated the spatial aspects and analyzed phoneme-dependent differences in voice directivity. This paper extends our previous research focusing on the directivity patterns of selected vowels and fricatives [Pörschmann and Arend, J. Acoust. Soc. Am. 149(6), 4553–4564 (2021)] and examines different groups of phonemes, such as plosives, nasals, voiced alveolars, and additional fricatives. For this purpose, full-spherical voice directivity measurements were performed for 13 persons while they articulated the respective phonemes. The sound radiation was recorded simultaneously using a surrounding spherical microphone array with 32 microphones and then spatially upsampled to a dense sampling grid. Based on these upsampled datasets, the spherical voice directivity was studied, and phoneme-dependent variations were analyzed. The results show significant differences between the groups of phonemes. However, within three groups (plosives, nasals, and voiced alveolars), the differences are small, and the variations in the directivity index were statistically insignificant. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF
25. Vowel height and velum position in German: Insights from a real-time magnetic resonance imaging study.
- Author
-
Kunay, Esther, Hoole, Philip, Gubian, Michele, Harrington, Jonathan, Jospeh, Arun, Voit, Dirk, and Frahm, Jens
- Subjects
- *
MAGNETIC resonance imaging , *GERMAN language , *VOWELS , *DIAGNOSTIC imaging , *PRINCIPAL components analysis - Abstract
Velum position was analysed as a function of vowel height in German tense and lax vowels preceding a nasal or oral consonant. Findings from previous research suggest an interdependence between vowel height and the degree of velum lowering, with a higher velum during high vowels and a more lowered velum during low vowels. In the current study, data were presented from 33 native speakers of Standard German who were measured via non-invasive high quality real-time magnetic resonance imaging. The focus was on exploring the spatiotemporal extent of velum lowering in tense and lax /a, i, o, ø/, which was done by analysing velum movement trajectories over the course of VN and VC sequences in CVNV and CVCV sequences by means of functional principal component analysis. Analyses focused on the impact of the vowel category and vowel tenseness. Data indicated that not only the position of the velum was affected by these factors but also the timing of velum closure. Moreover, it is argued that the effect of vowel height was to be better interpreted in terms of the physiological constriction location of vowels, i.e., the specific tongue position rather than phonetic vowel height. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
26. Speakers monitor auditory feedback for temporal alignment and linguistically relevant duration.
- Author
-
Karlin, Robin and Parrell, Benjamin
- Subjects
- *
EDIBLE fats & oils , *PHONEME (Linguistics) , *CONSONANTS , *VOWELS , *PSYCHOLOGICAL feedback , *HYPOTHESIS - Abstract
Recent altered auditory feedback studies suggest that speakers adapt to external perturbations to the duration of syllable nuclei and codas, but there is mixed evidence for adaptation of onsets. This study investigates this asymmetry, testing three hypotheses: (1) onsets adapt only if the perturbation produces a categorical error; (2) previously observed increases in vowel duration stem from feedback delays, rather than adaptation to durational perturbations; (3) gestural coordination between onsets and nuclei prevents independent adaptation of each segment. Word-initial consonant targets received shortening perturbations to approximate a different phoneme (cross-category; VOT of /t/ > /d/; duration of /s/ > /z/) or lengthening perturbations to generate a long version of the same phoneme (within-category; /k/ > [khh]; /ʃ/ > [ʃː]). Speakers adapted the duration of both consonants in the cross-category condition; in the within-category condition, only /k/ showed adaptive shortening. Speakers also lengthened all delayed segments while perturbation was active, even when segment duration was not perturbed. Finally, durational changes in syllable onsets and nuclei were not correlated, indicating that speakers can adjust each segment independently. The data suggest that speakers mainly attend to deviations from the predicted timing of motor states but do adjust for durational errors when linguistically relevant. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
27. The long and the short of vowel length perception in Danish.
- Author
-
Morris, David Jackson and Juul, Holger
- Subjects
- *
VOWELS , *AUDITORY perception , *SPEECH perception , *LINGUISTIC change - Abstract
Danish is a quantity language in which the length of vowels is either short or long. This study investigates vowel length in order to determine the degree to which we can ascribe the conventional categorical tag to vowel quantity perception. In a pilot study (n = 18) the gradual shortening of long vowels was identified as methodologically preferable for deriving stimuli continua, as complete identification functions could be fitted to the mean data. We employed this method to derive stimuli for identification and discrimination experiments (n = 32) that included the words used in the pilot and another word pair. This pair has phonetically similar variation in vowel duration although, due to recent language change, quantity is no longer contrastive. Results from the phonologically contrastive word pairs showed sigmoidal identification functions and discrimination peaks in the middle of the continua, while the identification slope for the non-contrastive pair was approximately linear and there was no clear discrimination peak. These differences show that the perception of speech contrasts is influenced by the linguistic experience of listeners as well as auditory and articulatory factors. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
28. Normalization of nonlinearly time-dynamic vowels.
- Author
-
Voeten, Cesko C., Heeringa, Wilbert, and Van de Velde, Hans
- Subjects
- *
VOWELS , *CENTROID , *ANATOMICAL variation - Abstract
This study compares 16 vowel-normalization methods for purposes of sociophonetic research. Most of the previous work in this domain has focused on the performance of normalization methods on steady-state vowels. By contrast, this study explicitly considers dynamic formant trajectories, using generalized additive models to model these nonlinearly. Normalization methods were compared using a hand-corrected dataset from the Flemish-Dutch Teacher Corpus, which contains 160 speakers from 8 geographical regions, who spoke regionally accented versions of Netherlandic/Flemish Standard Dutch. Normalization performance was assessed by comparing the methods' abilities to remove anatomical variation, retain vowel distinctions, and explain variation in the normalized F0–F3. In addition, it was established whether normalization competes with by-speaker random effects or supplements it, by comparing how much between-speaker variance remained to be apportioned to random effects after normalization. The results partly reproduce the good performance of Lobanov, Gerstman, and Nearey 1 found earlier and generally favor log-mean and centroid methods. However, newer methods achieve higher effect sizes (i.e., explain more variance) at only marginally worse performances. Random effects were found to be equally useful before and after normalization, showing that they complement it. The findings are interpreted in light of the way that the different methods handle formant dynamics. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
29. Correlates of vowel clarity in the spectrotemporal modulation domain: Application to speech impairment evaluation.
- Author
-
Marczyk, Anna, O'Brien, Benjamin, Tremblay, Pascale, Woisard, Virginie, and Ghio, Alain
- Subjects
- *
INTELLIGIBILITY of speech , *SPEECH , *VOWELS , *HEAD & neck cancer , *NATIVE language , *POWER spectra - Abstract
This article reports on vowel clarity metrics based on spectrotemporal modulations of speech signals. Motivated by previous findings on the relevance of modulation-based metrics for speech intelligibility assessment and pathology classification, the current study used factor analysis to identify regions within a bi-dimensional modulation space, the magnitude power spectrum, as in Elliott and Theunissen [(2009). PLoS Comput. Biol. 5(3), e1000302] by relating them to a set of conventional acoustic metrics of vowel space area and vowel distinctiveness. Two indices based on the energy ratio between high and low modulation rates across temporal and spectral dimensions of the modulation space emerged from the analyses. These indices served as input for measurements of central tendency and classification analyses that aimed to identify vowel-related speech impairments in French native speakers with head and neck cancer (HNC) and Parkinson dysarthria (PD). Following the analysis, vowel-related speech impairment was identified in HNC speakers, but not in PD. These results were consistent with findings based on subjective evaluations of speech intelligibility. The findings reported are consistent with previous studies indicating that impaired speech is associated with attenuation in energy in higher spectrotemporal modulation bands. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
30. Perception of English and Catalan vowels by English and Catalan listeners. Part II. Perceptual vs ecphoric similarity.
- Author
-
Cebrian, Juli
- Subjects
- *
ENGLISH language , *VOWELS , *NATIVE language , *SPEECH , *DISCIPLINE of children , *RECIPROCITY (Psychology) - Abstract
Although crosslinguistic similarity is a crucial concept for many disciplines in the speech sciences, there is no clear consensus as to the most appropriate method to measure it. This paper assessed the perceived similarity between English and Catalan vowels by means of an overt direct task evaluating perceptual similarity. The extent to which perceptual similarity is reciprocal is also explored by comparing perceptual judgements obtained by speakers of the two languages involved. Twenty-seven native Catalan speakers and 27 native English speakers rated the perceived dissimilarity between two aurally presented vowel stimuli. Trials included native–non-native pairs as well as native-native pairs to serve as baseline data. Some native–non-native pairs were perceived to be as similar as same-category native pairs, illustrating cases of very high crosslinguistic perceptual similarity. Further, in terms of reciprocity, the results showed a bidirectionality in similarity relationships that point to some cases of near-identical or shared categories and also illustrate the role of language-specific cue weighting in determining perceptual similarity. Finally, a comparison with the outcome of a previous study [Cebrian (2021). J. Acoust. Soc. Am. 149(4), 2671–2685], involving the same participants and languages but exploring ecphoric similarity, shows a generally high degree of agreement and a close relationship between the two types of similarity. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
31. Education in basic acoustics for acoustic phonetics and speech sciencea).
- Author
-
Arai, Takayuki
- Subjects
- *
SPEECH , *PHONETICS , *BASIC education , *ACOUSTICS , *VOWELS , *LUNGS - Abstract
Students in acoustic phonetics and speech science classes often do not have much technical background; an intuitive means to teach acoustic phenomena to them would, thus, be useful. Regarding speech production, physical demonstrations using vocal-tract models have been shown to be an intuitive way to teach acoustic phenomena. In particular, a series of models for different purposes has been developed by Arai over the last 20+ years, including lung models, sound sources, and vocal-tract models, e.g., see Arai [J. Acoust. Soc. Am. 131(3), 2444–2454 (2012)]. Different combinations of these models are helpful for teaching a variety of related topics in the classroom. However, there are still barriers to understanding certain concepts. This study examined ways of minimizing technical explanations and mathematical formulations and maximizing intuitive understanding of seven topics. Its findings were incorporated into an education program that was used in an actual lecture conducted online. A comparison of scores of questionnaires filled out by the audience before and after the lecture showed the program's effectiveness, especially in relating how a set of harmonic waves excites a multiple-resonance system and how the vowel /a/ is produced. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
32. Acoustic correlates and listener ratings of function word reduction in child versus adult speech.
- Author
-
Redford, Melissa A. and Howson, Phil J.
- Subjects
- *
SPEECH , *YOUNG adults , *ACOUSTIC measurements , *AGE groups , *VOWELS - Abstract
The present study investigated "the" reduction in phrase-medial Verb-the-Noun sequences elicited from 5-year-old children and young adults (18–22 yr). Several measures of reduction were calculated based on acoustic measurement of these sequences. Analyses on the measures indicated that the determiner vowel was reduced in both child and adult speech relative to content word vowels, but it was reduced less in child speech compared to adult speech. Listener ratings on the sequences indicated a preference for adult speech over children's speech. Acoustic measures of reduction also predicted goodness ratings. Listeners preferred sequences with shorter and lower amplitude determiner vowels relative to content word vowels. They also preferred a more neutral schwa over more coarticulated versions. In sequences where ratings differed by age group, the effect of coarticulation was limited to adult speech and the effect of relative schwa duration was limited to child speech. The results are discussed with reference to communicative pressures on speech, including the rhythmic and semantic pressures towards reduction versus the pressure to convey adequate information in the acoustic signal. It is argued that these competing pressures on production may delay the acquisition of adult-like function word reduction. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
33. Reconsidering commonly used stimuli in speech perception experiments.
- Author
-
Winn, Matthew B. and Wright, Richard A.
- Subjects
- *
SPEECH perception , *PERCEPTION testing , *STIMULUS & response (Psychology) , *ENGLISH language , *VOWELS - Abstract
This paper examines some commonly used stimuli in speech perception experiments and raises questions about their use, or about the interpretations of previous results. The takeaway messages are: 1) the Hillenbrand vowels represent a particular dialect rather than a gold standard, and English vowels contain spectral dynamics that have been largely underappreciated, 2) the /ɑ/ context is very common but not clearly superior as a context for testing consonant perception, 3) /ɑ/ is particularly problematic when testing voice-onset-time perception because it introduces strong confounds in the formant transitions, 4) /dɑ/ is grossly overrepresented in neurophysiological studies and yet is insufficient as a generalized proxy for "speech perception," and 5) digit tests and matrix sentences including the coordinate response measure are systematically insensitive to important patterns in speech perception. Each of these stimulus sets and concepts is described with careful attention to their unique value and also cases where they might be misunderstood or over-interpreted. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
34. Strategic perceptual weighting of acoustic cues for word stress in listeners with cochlear implants, acoustic hearing, or simulated bimodal hearing.
- Author
-
Fleming, Justin T. and Winn, Matthew B.
- Subjects
- *
STRESS (Linguistics) , *COCHLEAR implants , *SPEECH , *WORD recognition , *VOWELS , *SIGNAL processing , *SPEECH perception - Abstract
Perception of word stress is an important aspect of recognizing speech, guiding the listener toward candidate words based on the perceived stress pattern. Cochlear implant (CI) signal processing is likely to disrupt some of the available cues for word stress, particularly vowel quality and pitch contour changes. In this study, we used a cue weighting paradigm to investigate differences in stress cue weighting patterns between participants listening with CIs and those with normal hearing (NH). We found that participants with CIs gave less weight to frequency-based pitch and vowel quality cues than NH listeners but compensated by upweighting vowel duration and intensity cues. Nonetheless, CI listeners' stress judgments were also significantly influenced by vowel quality and pitch, and they modulated their usage of these cues depending on the specific word pair in a manner similar to NH participants. In a series of separate online experiments with NH listeners, we simulated aspects of bimodal hearing by combining low-pass filtered speech with a vocoded signal. In these conditions, participants upweighted pitch and vowel quality cues relative to a fully vocoded control condition, suggesting that bimodal listening holds promise for restoring the stress cue weighting patterns exhibited by listeners with NH. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
35. Influence of voice properties on vowel perception depends on speaker context.
- Author
-
Krumbiegel, Julius, Ufer, Carina, and Blank, Helen
- Subjects
- *
VOWELS , *SPEECH , *HUMAN voice , *SPEECH perception , *INFLUENCE - Abstract
Different speakers produce the same intended vowel with very different physical properties. Fundamental frequency (F0) and formant frequencies (FF), the two main parameters that discriminate between voices, also influence vowel perception. While it has been shown that listeners comprehend speech more accurately if they are familiar with a talker's voice, it is still unclear how such prior information is used when decoding the speech stream. In three online experiments, we examined the influence of speaker context via F0 and FF shifts on the perception of /o/-/u/ vowel contrasts. Participants perceived vowels from an /o/-/u/ continuum shifted toward /u/ when F0 was lowered or FF increased relative to the original speaker's voice and vice versa. This shift was reduced when the speakers were presented in a block-wise context compared to random order. Conversely, the original base voice was perceived to be shifted toward /u/ when presented in the context of a low F0 or high FF speaker, compared to a shift toward /o/ with high F0 or low FF speaker context. These findings demonstrate that that F0 and FF jointly influence vowel perception in speaker context. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
36. Context effects in perception of vowels differentiated by F1 are not influenced by variability in talkers' mean F1 or F3.
- Author
-
Mills, Hannah E., Shorey, Anya E., Theodore, Rachel M., and Stilp, Christian E.
- Subjects
- *
VOWELS , *CONTRAST effect - Abstract
Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F1) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443–1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3. Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1, mean F3), but these results may be task-dependent. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
37. Lateral vocalization in Brazilian Portuguese.
- Author
-
Howson, Phil J., Moisik, Scott, and Żygis, Marzena
- Subjects
- *
SOUNDS , *TONGUE , *VOWELS - Abstract
Lateral vocalization is a cross-linguistically common phenomenon where a lateral is realized as a glide, such as [w, j], or a vowel [u, i]. In this paper, we focus on the articulatory triggers that could cause lateral vocalization. We examined Brazilian Portuguese, a language known for the process of lateral vocalization in coda position. We examined the lateral in onset and coda position in four vocalic environments and compared the dynamic tongue contours and contours at the point of maximum constriction in each environment. We also performed biomechanical simulations of lateral articulation and the vocalized lateral. The results indicate increased tongue body retraction in coda position, which is accompanied by tongue body raising. Simulations further revealed that vocalized laterals mainly recruit intrinsic lingual muscles along with the styloglossus. Taken together, the data suggest that vocalization is a result of positional phonetic effects including lenition and additional retraction in the coda position. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
38. Context effects in perception of vowels differentiated by F1 are not influenced by variability in talkers' mean F1 or F3.
- Author
-
Mills, Hannah E., Shorey, Anya E., Theodore, Rachel M., and Stilp, Christian E.
- Subjects
VOWELS ,CONTRAST effect - Abstract
Spectral properties of earlier sounds (context) influence recognition of later sounds (target). Acoustic variability in context stimuli can disrupt this process. When mean fundamental frequencies (f0's) of preceding context sentences were highly variable across trials, shifts in target vowel categorization [due to spectral contrast effects (SCEs)] were smaller than when sentence mean f0's were less variable; when sentences were rearranged to exhibit high or low variability in mean first formant frequencies (F
1 ) in a given block, SCE magnitudes were equivalent [Assgari, Theodore, and Stilp (2019) J. Acoust. Soc. Am. 145(3), 1443–1454]. However, since sentences were originally chosen based on variability in mean f0, stimuli underrepresented the extent to which mean F1 could vary. Here, target vowels (/ɪ/-/ɛ/) were categorized following context sentences that varied substantially in mean F1 (experiment 1) or mean F3 (experiment 2) with variability in mean f0 held constant. In experiment 1, SCE magnitudes were equivalent whether context sentences had high or low variability in mean F1 ; the same pattern was observed in experiment 2 for new sentences with high or low variability in mean F3 . Variability in some acoustic properties (mean f0) can be more perceptually consequential than others (mean F1 , mean F3 ), but these results may be task-dependent. [ABSTRACT FROM AUTHOR]- Published
- 2022
- Full Text
- View/download PDF
39. Formant detail needed for identifying, rating, and discriminating vowels in Wisconsin Englisha).
- Author
-
Jibson, Jonathan
- Subjects
- *
VOWELS - Abstract
Neel [(2004). Acoust. Res. Lett. Online 5, 125–131] asked how much time-varying formant detail is needed for vowel identification. In that study, multiple stimuli were synthesized for each vowel: 1-point (monophthongal with midpoint frequencies), 2-point (linear from onset to offset), 3-point, 5-point, and 11-point. Results suggested that a 3-point model was optimal. This conflicted with the dual-target hypothesis of vowel inherent spectral change research, which has found that two targets are sufficient to model vowel identification. The present study replicates and expands upon the work of Neel. Ten English monophthongs were chosen for synthesis. One-, two-, three-, and five-point vowels were created as described above, and another 1-point stimulus was created with onset frequencies rather than midpoint frequencies. Three experiments were administered (n = 18 for each): vowel identification, goodness rating, and discrimination. The results ultimately align with the dual-target hypothesis, consistent with most vowel inherent spectral change studies. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
40. Out of sight, out of mind: The influence of communicative load and phonological neighborhood density on phonetic variation in real listener-directed speech.
- Author
-
Scarborough, Rebecca and Zellou, Georgia
- Subjects
- *
INTELLIGIBILITY of speech , *NEIGHBORHOODS , *DENSITY , *VOWELS - Abstract
Some models of speech production propose that speech variation reflects an adaptive trade-off between the needs of the listener and constraints on the speaker. The current study considers communicative load as both a situational and lexical variable that influences phonetic variation in speech to real interlocutors. The current study investigates whether the presence or absence of a target word in the sight of a real listener influences speakers' patterns of variation during a communicative task. To test how lexical difficulty also modulates intelligibility, target words varied in phonological neighborhood density (ND), a measure of lexical difficulty. Acoustic analyses reveal that speakers produced longer vowels in words that were not visually present for the listener to see, compared to when the listener could see those words. This suggests that speakers assess in real time the presence or absence of supportive visual information in assessing listener comprehension difficulty. Furthermore, the presence or absence of the word interacted with ND to predict both vowel duration and hyperarticulation patterns. These findings indicate that lexical measures of a word's difficulty and speakers' online assessment of lexical intelligibility (based on a word's visual presence or not) interactively influence phonetic modifications during communication with a real listener. [ABSTRACT FROM AUTHOR]
- Published
- 2022
- Full Text
- View/download PDF
41. Acoustic signatures of communicative dimensions in codified mother-infant interactions.
- Author
-
Falk, Simone and Audibert, Nicolas
- Subjects
- *
NURSERY rhymes , *BAYESIAN analysis , *MOTHER-infant relationship , *ORAL tradition , *VOWELS - Abstract
Nursery rhymes, lullabies, or traditional stories are pieces of oral tradition that constitute an integral part of communication between caregivers and preverbal infants. Caregivers use a distinct acoustic style when singing or narrating to their infants. Unlike spontaneous infant-directed (ID) interactions, codified interactions benefit from highly stable acoustics due to their repetitive character. The aim of the study was to determine whether specific combinations of acoustic traits (i.e., vowel pitch, duration, spectral structure, and their variability) form characteristic "signatures" of different communicative dimensions during codified interactions, such as vocalization type, interactive stimulation, and infant-directedness. Bayesian analysis, applied to over 14 000 vowels from codified live interactions between mothers and their 6-months-old infants, showed that a few acoustic traits prominently characterize arousing vs calm interactions and sung vs spoken interactions. While pitch and duration and their variation played a prominent role in constituting these signatures, more linguistic aspects such as vowel clarity showed small or no effects. Infant-directedness was identifiable in a larger set of acoustic cues than the other dimensions. These findings provide insights into the functions of acoustic variation of ID communication and into the potential role of codified interactions for infants' learning about communicative intent and expressive forms typical of language and music. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
42. Tonal cues to prosodic structure in rate-dependent speech perception.
- Author
-
Steffman, Jeremy and Jun, Sun-Ah
- Subjects
- *
ABSOLUTE pitch , *SPEECH perception , *VOWELS , *HUMAN voice - Abstract
This study explores how listeners integrate tonal cues to prosodic structure with their perception of local speech rate and consequent interpretation of durational cues. In three experiments, we manipulate the pitch and duration of speech segments immediately preceding a target sound along a vowel duration continuum (cueing coda stop voicing), testing how listeners' categorization of vowel duration shifts based on temporal and tonal context. We find that listeners perceive the presence of a phrasal boundary tone on a lengthened syllable as signaling a slowdown in speech rate, shifting perception of vowel duration, with effects that are additive when crossed in a 2 × 2 (pitch × duration) design. However, an asymmetrical effect of pitch and duration is found in an explicit duration judgement task in which listeners judge how long a pre-target syllable sounds to them. In explicit rate judgement, only durational information is consequential, unlike the categorization task, suggesting that integration of tonal and durational prosodic cues in rate-dependent perception is limited to implicit processing of speech rate. Results are discussed in terms of linguistic information in rate-dependent speech processing, the integration of prosodic cues, and implicit and explicit rate processing tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
43. Mandatory dichotic integration of second-formant information: Contralateral sine bleats have predictable effects on consonant place judgments.
- Author
-
Roberts, Brian, Summers, Robert J., and Bailey, Peter J.
- Subjects
- *
CONSONANTS , *EAR , *VOWELS , *CORRUPTION - Abstract
Speech-on-speech informational masking arises because the interferer disrupts target processing (e.g., capacity limitations) or corrupts it (e.g., intrusions into the target percept); the latter should produce predictable errors. Listeners identified the consonant in monaural buzz-excited three-formant analogues of approximant-vowel syllables, forming a place of articulation series (/w/-/l/-/j/). There were two 11-member series; the vowel was either high-front or low-back. Series members shared formant-amplitude contours, fundamental frequency, and F1+F3 frequency contours; they were distinguished solely by the F2 frequency contour before the steady portion. Targets were always presented in the left ear. For each series, F2 frequency and amplitude contours were also used to generate interferers with altered source properties—sine-wave analogues of F2 (sine bleats) matched to their buzz-excited counterparts. Accompanying each series member with a fixed mismatched sine bleat in the contralateral ear produced systematic and predictable effects on category judgments; these effects were usually largest for bleats involving the fastest rate or greatest extent of frequency change. Judgments of isolated sine bleats using the three place labels were often unsystematic or arbitrary. These results indicate that informational masking by interferers involved corruption of target processing as a result of mandatory dichotic integration of F2 information, despite the grouping cues disfavoring this integration. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
44. Modeling the effects of age and hearing loss on concurrent vowel scores.
- Author
-
Settibhaktini, Harshavardhan, Heinz, Michael G., and Chintanpalli, Ananthakrishna
- Subjects
- *
OLDER people , *HEARING disorders , *VOWELS , *EAR , *HAIR cells , *ALGORITHMS , *INTELLIGIBILITY of speech - Abstract
A difference in fundamental frequency (F0) between two vowels is an important segregation cue prior to identifying concurrent vowels. To understand the effects of this cue on identification due to age and hearing loss, Chintanpalli, Ahlstrom, and Dubno [(2016). J. Acoust. Soc. Am. 140, 4142–4153] collected concurrent vowel scores across F0 differences for younger adults with normal hearing (YNH), older adults with normal hearing (ONH), and older adults with hearing loss (OHI). The current modeling study predicts these concurrent vowel scores to understand age and hearing loss effects. The YNH model cascaded the temporal responses of an auditory-nerve model from Bruce, Efrani, and Zilany [(2018). Hear. Res. 360, 40–45] with a modified F0-guided segregation algorithm from Meddis and Hewitt [(1992). J. Acoust. Soc. Am. 91, 233–245] to predict concurrent vowel scores. The ONH model included endocochlear-potential loss, while the OHI model also included hair cell damage; however, both models incorporated cochlear synaptopathy, with a larger effect for OHI. Compared with the YNH model, concurrent vowel scores were reduced across F0 differences for ONH and OHI models, with the lowest scores for OHI. These patterns successfully captured the age and hearing loss effects in the concurrent-vowel data. The predictions suggest that the inability to utilize an F0-guided segregation cue, resulting from peripheral changes, may reduce scores for ONH and OHI listeners. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
45. Parameterizing spectral contrast effects in vowel categorization using noise contextsa).
- Author
-
Stilp, Christian E.
- Subjects
- *
CONTRAST effect , *AUDITORY perception , *VOWELS , *NOISE - Abstract
When spectra differ between earlier (context) and later (target) sounds, listeners perceive larger spectral changes than are physically present. When context sounds (e.g., a sentence) possess relatively higher frequencies, the target sound (e.g., a vowel sound) is perceived as possessing relatively lower frequencies, and vice versa. These spectral contrast effects (SCEs) are pervasive in auditory perception, but studies traditionally employed contexts with high spectrotemporal variability that made it difficult to understand exactly when context spectral properties biased perception. Here, contexts were speech-shaped noise divided into four consecutive 500-ms epochs. Contexts were filtered to amplify low-F1 (100–400 Hz) or high-F1 (550–850 Hz) frequencies to encourage target perception of /ɛ/ ("bet") or /ɪ/ ("bit"), respectively, via SCEs. Spectral peaks in the context ranged from its initial epoch(s) to its entire duration (onset paradigm), ranged from its final epoch(s) to its entire duration (offset paradigm), or were present for only one epoch (single paradigm). SCE magnitudes increased as spectral-peak durations increased and/or occurred later in the context (closer to the target). Contrary to predictions, brief early spectral peaks still biased subsequent target categorization. Results are compared to related experiments using speech contexts, and physiological and/or psychoacoustic idiosyncrasies of the noise contexts are considered. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
46. Response patterns to vowel formant perturbations in children.
- Author
-
Cheung, Stephanie T., Thompson, Kristen, Chen, Joyce L., Yunusova, Yana, and Beal, Deryk S.
- Subjects
- *
VOWELS , *PSYCHOLOGICAL feedback - Abstract
Auditory feedback is an important component of speech motor control, but its precise role in developing speech is less understood. The role of auditory feedback in development was probed by perturbing the speech of children 4–9 years old. The vowel sound /ɛ/ was shifted to /æ/ in real time and presented to participants as their own auditory feedback. Analyses of the resultant formant magnitude changes in the participants' speech indicated that children compensated and adapted by adjusting their formants to oppose the perturbation. Older and younger children responded to perturbation differently in F1 and F2. The compensatory change in F1 was greater for younger children, whereas the increase in F2 was greater for older children. Adaptation aftereffects were observed in both groups. Exploratory directional analyses in the two-dimensional formant space indicated that older children responded more directly and less variably to the perturbation than younger children, shifting their vowels back toward the vowel sound /ɛ/ to oppose the perturbation. Findings support the hypothesis that auditory feedback integration continues to develop between the ages of 4 and 9 years old such that the differences in the adaptive and compensatory responses arise between younger and older children despite receiving the same auditory feedback perturbation. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
47. Mandarin tone recognition training with cochlear implant simulation: Amplitude envelope enhancement and cue weighting.
- Author
-
Kim, Seeon, Chou, Hsiao-Hsiuan, and Luo, Xin
- Subjects
- *
SPEECH enhancement , *COCHLEAR implants , *ACOUSTICS , *ABSOLUTE pitch , *VOWELS - Abstract
With limited fundamental frequency (F0) cues, cochlear implant (CI) users recognize Mandarin tones using amplitude envelope. This study investigated whether tone recognition training with amplitude envelope enhancement may improve tone recognition and cue weighting with CIs. Three groups of CI-simulation listeners received training using vowels with amplitude envelope modified to resemble F0 contour (enhanced-amplitude-envelope training), training using natural vowels (natural-amplitude-envelope training), and exposure to natural vowels without training, respectively. Tone recognition with natural and enhanced amplitude envelope cues and cue weighting of amplitude envelope and F0 contour were measured in pre-, post-, and retention-tests. It was found that with similar pre-test performance, both training groups had better tone recognition than the no-training group after training. Only enhanced-amplitude-envelope training increased the benefits of amplitude envelope enhancement in the post- and retention-tests than in the pre-test. Neither training paradigm increased the cue weighting of amplitude envelope and F0 contour more than stimulus exposure. Listeners attending more to amplitude envelope in the pre-test tended to have better tone recognition with enhanced amplitude envelope cues before training and improve more in tone recognition after enhanced-amplitude-envelope training. The results suggest that auditory training and speech enhancement may bring maximum benefits to CI users when combined. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
48. Distance vs time. Acoustic and articulatory consequences of reduced vowel duration in Polish.
- Author
-
Strycharczuk, Patrycja, Ćavar, Małgorzata, and Coretta, Stefano
- Subjects
- *
VOWELS , *ULTRASONIC imaging , *VIDEOFLUOROSCOPY , *DATA reduction - Abstract
This paper presents acoustic and articulatory (ultrasound) data on vowel reduction in Polish. The analysis focuses on the question of whether the change in formant value in unstressed vowels can be explained by duration-driven undershoot alone or whether there is also evidence for additional stress-specific articulatory mechanisms that systematically affect vowel formants. On top of the expected durational differences between the stressed and unstressed conditions, the duration is manipulated by inducing changes in the speech rate. The observed vowel formants are compared to expected formants derived from the articulatory midsagittal tongue data in different conditions. The results show that the acoustic vowel space is reduced in size and raised in unstressed vowels compared to stressed vowels. Most of the spectral reduction can be explained by reduced vowel duration, but there is also an additional systematic effect of F1-lowering in unstressed non-high vowels that does not follow from tongue movement. The proposed interpretation is that spectral vowel reduction in Polish behaves largely as predicted by the undershoot model of vowel reduction, but the effect of undershoot is enhanced for low unstressed vowels, potentially by a stress marking strategy which involves raising the fundamental frequency. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
49. Training non-native vowel perception: In quiet or noise.
- Author
-
Mi, Lin, Tao, Sha, Wang, Wenjing, Dong, Qi, Dong, Bing, Li, Mingshuang, and Liu, Chang
- Subjects
- *
SPEECH perception , *VOWELS , *NOISE - Abstract
Noise makes speech perception much more challenging for non-native listeners than for native listeners. Training for non-native speech perception is usually implemented in quiet. It remains unclear if background noise may benefit or hamper non-native speech perception learning. In this study, 51 Chinese-native listeners were randomly assigned into three groups, including vowel training in quiet (TIQ), vowel training in noise (TIN), and watching videos in English as an active control. Vowel identification was assessed before (T1), right after (T2), and three months after training (T3) in quiet and various noise conditions. Results indicated that compared with the video watching group, the TIN group improved vowel identification in both quiet and noise significantly more at T2 and at T3. In contrast, the TIQ group improved significantly more in quiet and also in non-speech noise conditions at T2, but the improvement did not hold at T3. Moreover, compared to the TIQ group, the TIN group showed significantly less informational masking at both T2 and T3 and less energetic masking at T3. These results suggest that L2 speech training in background noise may improve non-native vowel perception more effectively than TIQ background only. The implications for non-native speech perception learning are discussed. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
50. Are source-filter interactions detectable in classical singing during vowel glides?
- Author
-
Echternach, Matthias, Herbst, Christian T., Köberlein, Marie, Story, Brad, Döllinger, Michael, and Gellrich, Donata
- Subjects
- *
SINGING , *VOWELS , *VOCAL tract , *SOUND recordings - Abstract
In recent studies, it has been assumed that vocal tract formants (Fn) and the voice source could interact. However, there are only few studies analyzing this assumption in vivo. Here, the vowel transition /i/-/a/-/u/-/i/ of 12 professional classical singers (6 females, 6 males) when phonating on the pitch D4 [fundamental frequency (ƒo) ca. 294 Hz] were analyzed using transnasal high speed videoendoscopy (20.000 fps), electroglottography (EGG), and audio recordings. Fn data were calculated using a cepstral method. Source-filter interaction candidates (SFICs) were determined by (a) algorithmic detection of major intersections of Fn/nƒo and (b) perceptual assessment of the EGG signal. Although the open quotient showed some increase for the /i-a/ and /u-i/ transitions, there were no clear effects at the expected Fn/nƒo intersections. In contrast, ƒo adjustments and changes in the phonovibrogram occurred at perceptually derived SFICs, suggesting level-two interactions. In some cases, these were constituted by intersections between higher nƒo and Fn. The presented data partially corroborates that vowel transitions may result in level-two interactions also in professional singers. However, the lack of systematically detectable effects suggests either the absence of a strong interaction or existence of confounding factors, which may potentially counterbalance the level-two-interactions. [ABSTRACT FROM AUTHOR]
- Published
- 2021
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.