Sequential voice conversion using grid-based approximation

Authors :: Hadas Benisty
David Malah
Koby Crammer
Source :: 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI).
Publication Year :: 2014
Publisher :: IEEE, 2014.
Abstract: Common voice conversion methods are based on Gaussian Mixture Modeling (GMM), which requires exhaustive training (typically lasting hours), often leading to ill-conditioning if the dataset used is too small. We propose a new conversion method that is trained in seconds, using either small or large scale datasets. The proposed Grid-Based (GB) method is based on sequential Bayesian tracking, by which the conversion process is expressed as a sequential estimation problem of tracking the target spectrum based on the observed source spectrum. The converted MFCC vectors are sequentially evaluated using a weighted sum of the target training set used as grid-points. To improve the perceived quality of the synthesized signals, we use a post-processing block for enhancing the global variance. Objective and subjective evaluations show that the enhanced-GB method is comparable to classic GMM-based methods, in terms of quality, and comparable to their enhanced versions, in terms of individuality.

Subjects :: Sequential estimation
Computer science
business.industry
Gaussian
Bayesian probability
Process (computing)
Pattern recognition
Scale (descriptive set theory)
Grid
symbols.namesake
symbols
Mel-frequency cepstrum
Artificial intelligence
business
Block (data storage)

Database :: OpenAIRE
Journal :: 2014 IEEE 28th Convention of Electrical & Electronics Engineers in Israel (IEEEI)
Accession number :: edsair.doi...........ff4c04f8731be4f386fa449230246a2d
Full Text :: https://doi.org/10.1109/eeei.2014.7005872