Start Over

Thousands of voices for HMM-based speech synthesis

Authors :: Yamagishi, J.
Usabaev, B.
Simon King
Watts, O.
Dines, J.
Tian, J.
Hu, R.
Guan, Y.
Oura, K.
Tokuda, K.
Karhila, R.
Kurimo, M.
Source :: Yamagishi, J, Usabaev, B, King, S, Watts, O, Dines, J, Tian, J, Hu, R, Guan, Y, Oura, K, Tokuda, K, Karhila, R & Kurimo, M 2009, Thousands of voices for HMM-based speech synthesis . in Proceedings of the Annual Conference of the International Speech Communication Association, INTERSPEECH 2009 : 10th Annual Conference of the International Speech Communication Association, INTERSPEECH 2009; Brighton, United Kingdom . pp. 420-423 ., Scopus-Elsevier
Publication Year :: 2009
Publisher :: ISCA, 2009.
Abstract: Our recent experiments with HMM-based speech synthesis systems have demonstrated that speaker-adaptive HMM-based speech synthesis (which uses an 'average voice model' plus model adaptation) is robust to non-ideal speech data that are recorded under various conditions and with varying microphones, that are not perfectly clean, and/or that lack of phonetic balance. This enables us consider building high-quality voices on 'non-TTS' corpora such as ASR corpora. Since ASR corpora generally include a large number of speakers, this leads to the possibility of producing an enormous number of voices automatically. In this paper we show thousands of voices for HMM-based speech synthesis that we have made from several popular ASR corpora such as the Wall Street Journal databases (WSJ0/WSJ1/WSJCAM0), Resource Management, Globalphone and Speecon. We report some perceptual evaluation results and outline the outstanding issues.