Start Over

Voice conversion using Deep Learning

Authors :: Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Bonafonte Cávez, Antonio
Pascual de la Puente, Santiago
Aparicio Isarn, Albert
Universitat Politècnica de Catalunya. Departament de Teoria del Senyal i Comunicacions
Bonafonte Cávez, Antonio
Pascual de la Puente, Santiago
Aparicio Isarn, Albert
Publication Year :: 2017
Abstract: In this project we present a first attempt at a Voice Conversion system based on Deep Learning in which the alignment between the training data is intrinsic to the model. Our system is structured in three main blocks. The first performs a vocoding of the speech (we have used Ahocoder for this task) and a normalization of the data. The second and main block consists of a Sequence-to-Sequence model. It consists of an RNN-based encoder-decoder structure with an Attention Mechanism. Its main strengts are the ability to process variable-length sequences, as well as aligning them internallly. The third block of the system performs a denormalization and reconstructs the speech signal. For the development of our system we have used the Voice Conversion Challenge 2016 dataset, as well as a part of the TC-STAR dataset. Unfortunately we have not obtained the results we expected. At the end of this thesis we present them and discuss some hypothesis to explain the reasons behind them.<br />En este proyecto presentamos un primer intento en la realización de un sistema de Conversión de Voz basado en Aprendizaje Profundo (\emph{Deep Learning}) en el cual el alineamiento de los datos de entrenamiento es intrínseco al modelo. Nuestro sistema está estructurado en tres bloques principales. El primer bloque codifica la señal de voz en parámetros (\emph{vocoding}). Hemos elegido el \emph{vocoder} Ahocoder para esta tarea. Este bloque también normaliza los parámetros codificados. El segundo bloque consiste en un modelo \emph{Sequence-to-Sequence}. Este modelo está formado por una estructura codificador-decodificador basada en Redes Neuronales Recurrentes (RNN) con un Mecanismo de Atención. Sus puntos fuertes son la capacidad de procesar secuencias de longitud variable, a la vez que las alinea internamente. El tercer bloque del sistema desnormaliza los parámetros, y reconstruye la señal de voz a partir de ellos. Para el desarrollo del modelo hemos usado el conjunto de datos (\emph{dataset}) del \emph{Voice Conversion Challenge} 2016. También hemos usado una parte del conjunto TC-STAR. Desafortunadamente no hemos obtenido los resultados que esperábamos. Al final de esta tesis los presentamos y proponemos varias hipótesis que los explican.<br />En aquest projecte presentem un primer itent en la realització d'un sistema de Conversió de Veu basat en Aprenentatge Profund (Deep Learning) en el qual l'alineament entre les dades d'entrenament sigui intrínsec al model. El nostre sistema s'estructura en tres blocs principals. El primer bloc codifica la veu en paràmetres (\emph{vocoding}). Hem usat el codificador Ahocoder per a aquesta tasca. A més a més, aquest primer bloc normalitza les dades. El segon bloc consisteix en un model \emph{Sequence-to-Sequence}. Consisteix en una estructura codificador-decodificador basada en Xarxes Neuronals Recurrents (RNN) amb un Mecanisme d'Atenció (\emph{Attention Mechanism}). Els punts forts d'aquest model són la capacitat per a tractar seqüències de durada variable, alhora que les alinea internament. El tercer bloc del sistema desnormalitza les seqüències i reconstrueix els senyals de veu. Per a desenvolupar el sistema hem usat el conjunt de dades del \emph{Voice Conversion Challenge} 2016. Hem fet servir també una part del conjunt TC-STAR. Desafortunadament no hem obtingut els resultats que esperàvem. Al final d'aquesta tesis presentem aquests resultats i plantegem algunes hipòtesis que els expliquen.

Details

Database :: OAIster
Notes :: application/pdf, English
Publication Type :: Electronic Resource
Accession number :: edsoai.ocn994293431
Document Type :: Electronic Resource

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Voice conversion using Deep Learning

Abstract

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Voice conversion using Deep Learning

Abstract

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources