Back to Search Start Over

Voice Conversion Using a Perceptual Criterion

Authors :
Ki-Seung Lee
Source :
Applied Sciences, Vol 10, Iss 8, p 2884 (2020)
Publication Year :
2020
Publisher :
MDPI AG, 2020.

Abstract

In voice conversion (VC), it is highly desirable to obtain transformed speech signals that are perceptually close to a target speaker’s voice. To this end, a perceptually meaningful criterion where the human auditory system was taken into consideration in measuring the distances between the converted and the target voices was adopted in the proposed VC scheme. The conversion rules for the features associated with the spectral envelope and the pitch modification factor were jointly constructed so that perceptual distance measurement was minimized. This minimization problem was solved using a deep neural network (DNN) framework where input features and target features were derived from source speech signals and time-aligned version of target speech signals, respectively. The validation tests were carried out for the CMU ARCTIC database to evaluate the effectiveness of the proposed method, especially in terms of perceptual quality. The experimental results showed that the proposed method yielded perceptually preferred results compared with independent conversion using conventional mean-square error (MSE) criterion. The maximum improvement in perceptual evaluation of speech quality (PESQ) was 0.312, compared with the conventional VC method.

Details

Language :
English
ISSN :
20763417
Volume :
10
Issue :
8
Database :
Directory of Open Access Journals
Journal :
Applied Sciences
Publication Type :
Academic Journal
Accession number :
edsdoj.fb1f585af195451bbe205f19aedbe8d6
Document Type :
article
Full Text :
https://doi.org/10.3390/app10082884