Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis

Authors :: Shinji Takaki
Junichi Yamagishi
SangJin Kim
Source :: Takaki, S, Kim, S & Yamagishi, J 2016, Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis . in 9th ISCA Speech Synthesis Workshop . pp. 167-173, 9th ISCA Speech Synthesis Workshop, Sunnyvale, United States, 13/09/16 . https://doi.org/10.21437/SSW.2016-25, SSW
Publication Year :: 2016
Abstract: In this paper, we investigate the effectiveness of speaker adaptation for various essential components in deep neural network based speech synthesis, including acoustic models, acoustic feature extraction, and post-filters. In general, a speaker adaptation technique, e.g., maximum likelihood linear regression (MLLR) for HMMs or learning hidden unit contributions (LHUC) for DNNs, is applied to an acoustic modeling part to change voice characteristics or speaking styles. However, since we have proposed a multiple DNN-based speech synthesis system, in which several components are represented based on feed-forward DNNs, a speaker adaptation technique can be applied not only to the acoustic modeling part but also to other components represented by DNNs. In experiments using a small amount of adaptation data, we performed adaptation based on LHUC and simple additional fine tuning for DNNbased acoustic models, deep auto-encoder based feature extraction, and DNN-based post-filter models and compared them with HMM-based speech synthesis systems using MLLR.

Subjects :: Artificial neural network
Computer science
Time delay neural network
Speech recognition
Speech synthesis
computer.software_genre
computer
Speaker adaptation

Language :: English
Database :: OpenAIRE
Journal :: Takaki, S, Kim, S & Yamagishi, J 2016, Speaker Adaptation of Various Components in Deep Neural Network based Speech Synthesis . in 9th ISCA Speech Synthesis Workshop . pp. 167-173, 9th ISCA Speech Synthesis Workshop, Sunnyvale, United States, 13/09/16 . https://doi.org/10.21437/SSW.2016-25, SSW
Accession number :: edsair.doi.dedup.....6feb74aced42530e873dd12ae2621357
Full Text :: https://doi.org/10.21437/SSW.2016-25

Full Text Access

Tools