Back to Search Start Over

Multi-Speaker End-to-End Speech Synthesis

Authors :
Park, Jihyun
Zhao, Kexin
Peng, Kainan
Ping, Wei
Publication Year :
2019

Abstract

In this work, we extend ClariNet (Ping et al., 2019), a fully end-to-end speech synthesis model (i.e., text-to-wave), to generate high-fidelity speech from multiple speakers. To model the unique characteristic of different voices, low dimensional trainable speaker embeddings are shared across each component of ClariNet and trained together with the rest of the model. We demonstrate that the multi-speaker ClariNet outperforms state-of-the-art systems in terms of naturalness, because the whole model is jointly optimized in an end-to-end manner.

Details

Database :
arXiv
Publication Type :
Report
Accession number :
edsarx.1907.04462
Document Type :
Working Paper