Back to Search
Start Over
A new approach to speaker adaptation by modelling pronunciation in automatic speech recognition
A new approach to speaker adaptation by modelling pronunciation in automatic speech recognition
- Source :
- Speech Communication. 13:281-286
- Publication Year :
- 1993
- Publisher :
- Elsevier BV, 1993.
-
Abstract
- To deal with large lexica (more than 2000 words) automatic speech recognition systems (ASR) use an internal phonetic representation of the speech signal and phonemic models of pronunciation from the lexicon to search for the spoken word chain or sentence. Therefore it is possible to model different pronunciations of a word in the lexicon. In German we observed that individual speakers pronounce words in a typical way that depends on several factors as sex, age, place of living, place of birth, etc. Our goal is to enhance speech recognition by automatically adapting the models of pronunciation in the lexicon to the unknown speaker. The obvious problem is: You cannot wait until the present speaker has uttered approximately 2000 different words at least once. We solved this problem by generalization of observed rules of differing pronunciation to words not yet observed. Another method presented in this paper is speaker adaptation by re-estimating the a posteriori probabilities of the phonetic units used in a “bottom up” ASR system. A word hypothesis is evaluated by the product of the a posteriori probabilities of the phonetic units produced by the classification to the phonetic units belonging to the word hypothesis. Normally these probabilities are estimated during the training of the ASR system and stay fixed during the test. We propose an algorithm which observes the typical confusions of phonetic units of the unknown speaker and adapts the a posteriori probabilities continuously.
- Subjects :
- Linguistics and Language
Computer science
business.industry
Communication
Speech recognition
Phonetics
Pronunciation
Lexicon
Speaker recognition
computer.software_genre
Speech processing
Language and Linguistics
Computer Science Applications
Phonetic representation
Speaker diarisation
Modeling and Simulation
Computer Vision and Pattern Recognition
Artificial intelligence
business
computer
Software
Sentence
Natural language processing
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 13
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi...........cb4e1bc07f5b1fd152fb743841139d86
- Full Text :
- https://doi.org/10.1016/0167-6393(93)90026-h