11 results on '"vocal tract modeling"'
Search Results
2. Experiments on using vocal tract estimates of nasal stops for speaker verification.
- Author
-
Enzinger, Ewald and Kasess, Christian H.
- Abstract
Nasal stops have been recognized as an important source of speaker-discriminating features. The nasal cavity is, with the exception of the velar junction, independent of articulatory movements. As the complex nasal structure varies from person to person, features dependent upon nasal acoustics may have low within-speaker and high between-speaker variability. In this study we use a Bayesian estimation technique to obtain reflection coefficients of a branched-tube model of the combined nasal and oral tract. These are then used as parameters in speaker verification experiments. The performance is evaluated on the basis of speakers from the TIMIT corpus as well as the Kiel corpus and is compared with that of a system based on Mel frequency cepstral coefficient (MFCC) features. Fusion of both systems indicates that the two approaches offer complementary information. [ABSTRACT FROM PUBLISHER]
- Published
- 2013
- Full Text
- View/download PDF
3. Data-driven voice source waveform analysis and synthesis
- Author
-
Gudnason, Jon, Thomas, Mark R.P., Ellis, Daniel P.W., and Naylor, Patrick A.
- Subjects
- *
PRINCIPAL components analysis , *GAUSSIAN processes , *PROTOTYPES , *ORATORS , *INTONATION (Phonetics) , *HUMAN voice , *APPROXIMATION theory , *SIGNAL processing - Abstract
Abstract: A data-driven approach is introduced for studying, analyzing and processing the voice source signal. Existing approaches parameterize the voice source signal by using models that are motivated, for example, by a physical model or function-fitting. Such parameterization is often difficult to achieve and it produces a poor approximation to a large variety of real voice source waveforms of the human voice. This paper presents a novel data-driven approach to analyze different types of voice source waveforms using principal component analysis and Gaussian mixture modeling. This approach models certain voice source features that many other approaches fail to model. Prototype voice source waveforms are obtained from each mixture component and analyzed with respect to speaker, phone and pitch. An analysis/synthesis scheme was set up to demonstrate the effectiveness of the method. Compression of the proposed voice source by discarding 75% of the features yields a segmental signal-to-reconstruction error ratio of 13dB and a Bark spectral distortion of 0.14. [Copyright &y& Elsevier]
- Published
- 2012
- Full Text
- View/download PDF
4. Acoustic Modeling Using the Digital Waveguide Mesh.
- Author
-
Murphy, D., Kelloniemi, A., Mullen, J., and Shelley, S.
- Abstract
The digital waveguide mesh has been an active area of music acoustics research for over ten years. Although founded in 1-D digital waveguide modeling, the principles on which it is based are not new to researchers grounded in numerical simulation, FDTD methods, electromagnetic simulation, etc. This article has attempted to provide a considerable review of how the DWM has been applied to acoustic modeling and sound synthesis problems, including new 2-D object synthesis and an overview of recent research activities in articulatory vocal tract modeling, RIR synthesis, and reverberation simulation. The extensive, although not by any means exhaustive, list of references indicates that though the DWM may have parallels in other disciplines, it still offers something new in the field of acoustic simulation and sound synthesis [ABSTRACT FROM PUBLISHER]
- Published
- 2007
- Full Text
- View/download PDF
5. Vocal-Tract Modeling: Fractional Elongation of Segment Lengths in a Waveguide Model With Half-Sample Delays.
- Author
-
Mathur, Siddharth, Story, Brad H., and Rodríguez, Jeffrey J.
- Subjects
WAVEGUIDES ,SPEECH perception ,ELECTROMAGNETIC waves ,VOCAL tract ,LARYNX - Abstract
Digital waveguide models are commonly used for simulating vocal-tract acoustics based on physiological data. in particular, waveguide models with half-sample delays are known to be well suited for speech production research. This paper presents enhancements to such a model aimed at improved accuracy in mapping physiological vocal-tract data (shape and length of the airway) to waveguide parameters. The enhancements allow the length of the vocal tract to be continuously varied, thus enabling more realistic synthesis. This is achieved by smoothly varying the individual segment lengths of a piecewise-cylindrical representation of the airway, without altering the system sampling frequency. Fractional-delay filters are used for spatial interpolation of the digital waveguide model. The algorithms are validated by modeling the protrusion of lips, lowering of larynx and lengthening of intermediate segments for a static vowel shape. [ABSTRACT FROM AUTHOR]
- Published
- 2006
- Full Text
- View/download PDF
6. Effect of the Signal Measured from the Glottis on Determination of the Vocal Tract Shape.
- Author
-
Gülmezoğlu, M. and Barkana, Atalay
- Abstract
All-pole and pole-zero models for the vocal tract are developed. First an impulse train, then the pressure signal measured from the glottis, is used as the input in the models. The models for eight Turkish vowels produced by one male subject are studied to determine the effects of the presumed impulse train and the pressure signal measured from the glottis on the estimation of the vocal tract shape. The motion of the tongue is also examined for a whole word. © 1998 Biomedical Engineering Society. PAC98: 4372-p, 8710+e [ABSTRACT FROM AUTHOR]
- Published
- 1998
- Full Text
- View/download PDF
7. An Articulatory-Based Singing Voice Synthesis Using Tongue and Lips Imaging
- Author
-
Kele Xu, Aurore Jaumard-Hakoun, P. Roussel-Ragot, Bruce Denby, Clémence Leboullenger, Institut Langevin - Ondes et Images (UMR7587) (IL), Sorbonne Université (SU)-Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Université de Paris (UP)-Centre National de la Recherche Scientifique (CNRS), Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), Université Paris sciences et lettres (PSL), Université Pierre et Marie Curie - Paris 6 (UPMC), Tianjin University (TJU), and Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Sorbonne Université (SU)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
[SPI.ACOU]Engineering Sciences [physics]/Acoustics [physics.class-ph] ,Deep Neural Networks ,Computer science ,Speech recognition ,vocal tract modeling ,[SHS.INFO]Humanities and Social Sciences/Library and information sciences ,02 engineering and technology ,Autoencoder ,Singing voice synthesis ,03 medical and health sciences ,ultrasound imaging ,0302 clinical medicine ,medicine.anatomical_structure ,Tongue ,Multilayer perceptron ,0202 electrical engineering, electronic engineering, information engineering ,medicine ,020201 artificial intelligence & image processing ,rare singing ,Singing ,030223 otorhinolaryngology ,Articulation (phonetics) ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Vocal tract - Abstract
International audience; Ultrasound imaging of the tongue and videos of lips movements can be used to investigate specific articulation in speech or singing voice. In this study, tongue and lips image sequences recorded during singing performance are used to predict vocal tract properties via Line Spectral Frequencies (LSF). We focused our work on traditional Corsican singing " Cantu in paghjella ". A multimodal Deep Autoencoder (DAE) extracts salient descriptors directly from tongue and lips images. Afterwards, LSF values are predicted from the most relevant of these features using a multilayer perceptron. A vocal tract model is derived from the predicted LSF, while a glottal flow model is computed from a synchronized electroglottographic recording. Articulatory-based singing voice synthesis is developed using both models. The quality of the prediction and singing voice synthesis using this method outperforms the state of the art method.
- Published
- 2016
- Full Text
- View/download PDF
8. Capturing, Analyzing, and Transmitting Intangible Cultural Heritage with the i-Treasures Project
- Author
-
Jaumard-Hakoun , Aurore, Al Kork , Samer K., Adda-Decker , Martine, Amelot , Angelique, Crevier Buchman , Lise, Dreyfus , Gérard, Fux , Thibaut, Roussel , Pierre, Pillot-Loiseau , Claire, Stone , Maureen, Denby , Bruce, Laboratoire Signaux, Modèles et Apprentissage Statistique (SIGMA), Ecole Superieure de Physique et de Chimie Industrielles de la Ville de Paris (ESPCI Paris), Université Paris sciences et lettres (PSL)-Université Paris sciences et lettres (PSL)-Centre National de la Recherche Scientifique (CNRS), LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 (LPP), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique (CNRS), Hôpital Européen Georges Pompidou [APHP] (HEGP), Assistance publique - Hôpitaux de Paris (AP-HP) (AP-HP)-Hôpitaux Universitaires Paris Ouest - Hôpitaux Universitaires Île de France Ouest (HUPO), Vocal Tract Visualization Lab [Baltimore] (VTVL), University of Maryland School of Dentistry [Baltimore] (UMSOD), University of Maryland System-University of Maryland System, Edinburgh University and Queen Margaret University, ANR-11-IDEX-0005,USPC,Université Sorbonne Paris Cité(2011), SIGMA Laboratory, ESPCI ParisTech ( SIGMA Laboratory, ESPCI ParisTech ), ESPCI ParisTech, LPP - Laboratoire de Phonétique et Phonologie - UMR 7018 ( LPP ), Université Sorbonne Nouvelle - Paris 3-Centre National de la Recherche Scientifique ( CNRS ), Hôpital Européen Georges Pompidou [APHP] ( HEGP ), Vocal Tract Visualization Lab [Baltimore] ( VTVL ), University of Maryland School of Dentistry [Baltimore] ( UMSOD ), and ANR-10-LABX-0083,Labex EFL,Programme 'Investissements d’avenir' géré par l’Agence Nationale de la Recherche ANR-10-LABX-0083 (Labex EFL)
- Subjects
extracting tongue contour method ,multi-sensor acquisition system ,vocal tract modeling ,[ SHS.LANGUE ] Humanities and Social Sciences/Linguistics ,preliminary data processing ,Rare traditional songs ,[SHS.LANGUE]Humanities and Social Sciences/Linguistics ,COM - Abstract
Résumé long de la communication visible au lien suivant: http://www.qmu.ac.uk/casl/conf/ultrafest%5F2013/docs/AJaumard-Hakoun_1_Ultrafest.pdf; International audience; The i-Treasures project, which officially began on 1 February 2013, is a 12-partner FP7 project that proposes to use multi- sensor technology to capture, preserve, and transmit four types of intangible cultural heritage, referred to as 'use cases': rare traditional songs, rare dance interactions, traditional craftsmanship and contemporary music composition. Methodologies used will include body and gesture recognition, vocal tract modeling, speech processing and electroencephalography (EEG). The "Rare traditional songs" use case, which will be the focus of our work, targets Corsican "cantu in Paghjella", Sardinian "Canto a tenore", Mt. Athos Greek byzantine hymns and the recent "Human beat box" styles. The final objective of the "Rare traditional songs" use case is to capture vocal tract movements with sufficient accuracy to drive a real-time 2D or 3D avatar of the vocal tract, which will in turn play a crucial role in the transmitting of captured invisible cultural heritage to future generations. The acquisition sensors are: multi-sensor helmet, nose-mounted accelerometer, camera, ultrasound probe, Electroglottograph, Microphone and breathing belt. This system is described in this study, and also the vocal tract modeling, the preliminary data processing, and the extracting tongue contour method.
- Published
- 2013
9. Vocal tract modeling techniques: from human voice to non-human primates vocalizations
- Author
-
Gamba, Marco, Torti, Valeria, Colombo, CAMILLA MARTA PEDINA, and Giacoma, Cristina
- Subjects
Vocal tract resonance ,Human phonation ,lcsh:Biology (General) ,Formants ,Biochemistry (medical) ,Source-Filter Theory ,Vocal tract modeling ,Plant Science ,acoustics, phonation, evolution, resonance, formants, comparative approach ,lcsh:QH301-705.5 ,General Biochemistry, Genetics and Molecular Biology - Published
- 2012
10. Design, Realization and Experiments with a new RF Head Probe Coil for Human Vocal Tract Imaging in an NMR device
- Author
-
Ivan Frollo, Jiří Přibil, D. Gogola, and T. Dermek
- Subjects
Engineering ,rf probe coil ,medicine.diagnostic_test ,Computer simulation ,business.industry ,vocal tract modeling ,Acoustics ,Biomedical Engineering ,Magnetic resonance imaging ,Finite element method ,Magnetic field ,Control and Systems Engineering ,Electromagnetic coil ,nmr imaging ,QA1-939 ,medicine ,field calculation ,Phonation ,Tomography ,business ,Instrumentation ,Mathematics ,Vocal tract - Abstract
HE NON-INVASIVE magnetic resonance (MR) scanning of the human vocal tract volume enables to develop the three-dimensional (3D) computer models of the vocal tract. These 3D models are necessary for understanding the basic physical principles for the creation of human speech and voice as close to reality as possible. The primary volume models of the human acoustic supraglottal spaces created from the MR images can then be transformed into the 3D finite element (FE) models [1]. Such models are helpful for modeling the real clinical situation, such as influence of various inborn defects in human supraglottal spaces on speech and voice or simulations of various postsurgical states in patients [2]. The quality of the developed FE models has to be checked by a sufficiently accurate numerical simulation of the subject phonation during the NMR scanning and therefore the simultaneous acoustic recording of subject voice during the scan procedure is very important [3], [4]. Head probe coils are commonly produced for the magnetic resonance imaginer (MRI) equipment working with strong magnetic field, but these systems produce a lot of unwanted acoustic noise. The solutions to the acoustic noise problems are low field MR scanners, but these are not usually provided with the head/neck coils. Therefore, this study focused on the development of a new MR receiving head coil, for a tomograph with low magnetic field to be used primarily in human vocal tract imaging.
- Published
- 2012
- Full Text
- View/download PDF
11. Analysis, Vocal-tract modeling, and Automatic Detection of Vowel Nasalization
- Author
-
Pruthi, Tarun and Pruthi, Tarun
- Abstract
The aim of this work is to clearly understand the salient features of nasalization and the sources of acoustic variability in nasalized vowels, and to suggest Acoustic Parameters (APs) for the automatic detection of vowel nasalization based on this knowledge. Possible applications in automatic speech recognition, speech enhancement, speaker recognition and clinical assessment of nasal speech quality have made the detection of vowel nasalization an important problem to study. Although several researchers in the past have found a number of acoustical and perceptual correlates of nasality, automatically extractable APs that work well in a speaker-independent manner are yet to be found. In this study, vocal tract area functions for one American English speaker, recorded using Magnetic Resonance Imaging, were used to simulate and analyze the acoustics of vowel nasalization, and to understand the variability due to velar coupling area, asymmetry of nasal passages, and the paranasal sinuses. Based on this understanding and an extensive survey of past literature, several automatically extractable APs were proposed to distinguish between oral and nasalized vowels. Nine APs with the best discrimination capability were selected from this set through Analysis of Variance. The performance of these APs was tested on several databases with different sampling rates, recording conditions and languages. Accuracies of 96.28%, 77.90% and 69.58% were obtained by using these APs on StoryDB, TIMIT and WS96/97 databases, respectively, in a Support Vector Machine classifier framework. To my knowledge, these results are the best anyone has achieved on this task. These APs were also tested in a cross-language task to distinguish between oral and nasalized vowels in Hindi. An overall accuracy of 63.72% was obtained on this task. Further, the accuracy for phonemically nasalized vowels, 73.40%, was found to be much higher than the accuracy of 53.48% for coarticulatorily nasalized vowels. This re
- Published
- 2007
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.