Back to Search Start Over

Towards cross-lingual voice cloning in higher education

Authors :
Manuel Jimenez
Joan Albert Silvestre-Cerdà
Alfons Juan
Gonçal V. Garcés Díaz-Munío
Alejandro Pérez
Carlos Turro
Jorge Civera
Alberto Sanchis
Adrià Giménez
Source :
RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Publication Year :
2021
Publisher :
Elsevier, 2021.

Abstract

[EN] The rapid progress of modern AI tools for automatic speech recognition and machine translation is leading to a progressive cost reduction to produce publishable subtitles for educational videos in multiple languages. Similarly, text-to-speech technology is experiencing large improvements in terms of quality, flexibility and capabilities. In particular, state-of-the-art systems are now capable of seamlessly dealing with multiple languages and speakers in an integrated manner, thus enabling lecturer¿s voice cloning in languages she/he might not even speak. This work is to report the experience gained on using such systems at the Universitat Politècnica de València (UPV), mainly as a guidance for other educational organizations willing to conduct similar studies. It builds on previous work on the UPV¿s main repository of educational videos, MediaUPV, to produce multilingual subtitles at scale and low cost. Here, a detailed account is given on how this work has been extended to also allow for massive machine dubbing of MediaUPV. This includes collecting 59 h of clean speech data from UPV¿s academic staff, and extending our production pipeline of subtitles with a state-of-the-art multilingual and multi-speaker text-to-speech system trained from the collected data. Our main result comes from an extensive, subjective evaluation of this system by lecturers contributing to data collection. In brief, it is shown that text-to-speech technology is not only mature enough for its application to MediaUPV, but also needed as soon as possible by students to improve its accessibility and bridge language barriers.<br />We wish first to thank all UPV lecturers who made this study possi-ble. We are also very grateful for the funding support received by the European Union's Horizon 2020 research and innovation programme under grant agreement no. 761758 (X5gon) , the Spanish government under grant RTI2018-094879-B-I00 (Multisub, MCIU/AEI/FEDER) , and the Universitat Politecnica de Valencia's, Spain PAID-01-17 R&D sup-port programme. Funding for open access charge: CRUE-Universitat Politecnica de Valencia

Details

Language :
English
Database :
OpenAIRE
Journal :
RiuNet. Repositorio Institucional de la Universitat Politécnica de Valéncia, instname
Accession number :
edsair.doi.dedup.....48b96e00877a154dd7580c0f745470ad
Full Text :
https://doi.org/10.1016/j.engappai.2021.104413