Start Over

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

Authors :: Zhu, Xinfa
Lei, Yi
Li, Tao
Zhang, Yongmao
Zhou, Hongbin
Lu, Heng
Xie, Lei
Publication Year :: 2023
Abstract: Previous multilingual text-to-speech (TTS) approaches have considered leveraging monolingual speaker data to enable cross-lingual speech synthesis. However, such data-efficient approaches have ignored synthesizing emotional aspects of speech due to the challenges of cross-speaker cross-lingual emotion transfer - the heavy entanglement of speaker timbre, emotion, and language factors in the speech signal will make a system produce cross-lingual synthetic speech with an undesired foreign accent and weak emotion expressiveness. This paper proposes the Multilingual Emotional TTS (METTS) model to mitigate these problems, realizing both cross-speaker and cross-lingual emotion transfer. Specifically, METTS takes DelightfulTTS as the backbone model and proposes the following designs. First, to alleviate the foreign accent problem, METTS introduces multi-scale emotion modeling to disentangle speech prosody into coarse-grained and fine-grained scales, producing language-agnostic and language-specific emotion representations, respectively. Second, as a pre-processing step, formant shift-based information perturbation is applied to the reference signal for better disentanglement of speaker timbre in the speech. Third, a vector quantization-based emotion matcher is designed for reference selection, leading to decent naturalness and emotion diversity in cross-lingual synthetic speech. Experiments demonstrate the good design of METTS.<br />Comment: 10 pages, 3 figures

Subjects :: Electrical Engineering and Systems Science - Audio and Speech Processing

Details

Database :: arXiv
Publication Type :: Report
Accession number :: edsarx.2307.15951
Document Type :: Working Paper

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

METTS: Multilingual Emotional Text-to-Speech by Cross-speaker and Cross-lingual Emotion Transfer

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources