Back to Search
Start Over
Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis
- Source :
- Takaki, S, Kim, S & Yamagishi, J 2016, Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis . in 9th ISCA Speech Synthesis Workshop . pp. 167-173, 9th ISCA Speech Synthesis Workshop, Sunnyvale, United States, 13/09/16 . https://doi.org/10.21437/SSW.2016-25, SSW
- Publication Year :
- 2016
-
Abstract
- In this paper, we investigate the effectiveness of speaker adaptation for various essential components in deep neural network based speech synthesis, including acoustic models, acoustic feature extraction, and post-filters. In general, a speaker adaptation technique, e.g., maximum likelihood linear regression (MLLR) for HMMs or learning hidden unit contributions (LHUC) for DNNs, is applied to an acoustic modeling part to change voice characteristics or speaking styles. However, since we have proposed a multiple DNN-based speech synthesis system, in which several components are represented based on feed-forward DNNs, a speaker adaptation technique can be applied not only to the acoustic modeling part but also to other components represented by DNNs. In experiments using a small amount of adaptation data, we performed adaptation based on LHUC and simple additional fine tuning for DNNbased acoustic models, deep auto-encoder based feature extraction, and DNN-based post-filter models and compared them with HMM-based speech synthesis systems using MLLR.
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Takaki, S, Kim, S & Yamagishi, J 2016, Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis . in 9th ISCA Speech Synthesis Workshop . pp. 167-173, 9th ISCA Speech Synthesis Workshop, Sunnyvale, United States, 13/09/16 . https://doi.org/10.21437/SSW.2016-25, SSW
- Accession number :
- edsair.doi.dedup.....6feb74aced42530e873dd12ae2621357
- Full Text :
- https://doi.org/10.21437/SSW.2016-25