Back to Search
Start Over
Pronunciation Dictionary-Free Multilingual Speech Synthesis Using Learned Phonetic Representations
- Source :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2023, Vol. 31 Issue: 1 p3706-3716, 11p
- Publication Year :
- 2023
-
Abstract
- This article presents a multilingual speech synthesis approach that leverages learned phonetic representations to eliminate the need for pronunciation dictionaries in target languages. The learned phonetic representations consist of unsupervised phonetic representations (UPR) and supervised phonetic representations (SPR). To extract UPRs, a pre-trained wav2vec 2.0 model is utilized, while a language-independent automatic speech recognition (LI-ASR) model with a connectionist temporal classification (CTC) loss is employed to derive segment-level SPRs from the speech data of target languages. An acoustic model using UPRs and SPRs as intermediate representations is then designed, comprising a UPR predictor, an SPR predictor, and a representation-to-mel-spectrogram (RTM) converter. The two predictors generate UPRs and SPRs from texts, respectively. The RTM converter first combines UPRs with SPRs using a Transformer-based encoder, and then feeds the merged representations into a decoder to produce mel-spectrograms. Considering the difficulty of collecting large training corpora for all languages in multilingual speech synthesis, the parameters of both the two predictors and the RTM converter can be pre-trained on non-target languages to further improve model performance. Experimental results on six target languages demonstrate that our method outperformed the approaches directly predicting mel-spectrograms from character or phoneme sequences, and pre-training the acoustic model using a multilingual corpus further improved the performance of synthetic speech.
Details
- Language :
- English
- ISSN :
- 23299290
- Volume :
- 31
- Issue :
- 1
- Database :
- Supplemental Index
- Journal :
- IEEE-ACM Transactions on Audio, Speech, and Language Processing
- Publication Type :
- Periodical
- Accession number :
- ejs64350264
- Full Text :
- https://doi.org/10.1109/TASLP.2023.3313424