Back to Search
Start Over
Enhancing the Intelligibility of Statistically Generated Synthetic Speech by Means of Noise-Independent Modifications
- Source :
- IEEE/ACM Transactions on Audio, Speech, and Language Processing. 22:2101-2111
- Publication Year :
- 2014
- Publisher :
- Institute of Electrical and Electronics Engineers (IEEE), 2014.
-
Abstract
- When speaking devices such as smartphones, tablet-PCs, or GPS systems are used in noisy outdoor environments, the intelligibility of speech significantly drops. This is even more pronounced when synthetic speech is used. This article describes how a statistical parametric speech synthesis system trained on an ordinary synthesis database can be designed to generate highly intelligible speech, even at very low signal-to-noise ratios. By using a simple and flexible vocoder based on a full-band harmonic model, the proposed system applies deterministic noise-independent modifications at several levels: speaking rate, average fundamental frequency level and range, energy contour over time, formant sharpness, and intensity of specific spectral bands. The degree of intelligibility achieved by the system has been evaluated by means of a large-scale subjective test, the results of which show that the suggested approach clearly outperforms a reference state-of-the-art TTS system and also unmodified natural speech in some conditions. In comparison with alternative systems evaluated in the same framework, the proposed one exhibits the best performance in the scenarios with lowest signal-to-noise ratio. Finally, the impact of the suggested modifications on naturalness, quality and similarity to the original natural voice is quantified by means of a subjective test.
- Subjects :
- Voice activity detection
Acoustics and Ultrasonics
Computer science
Speech recognition
Speech coding
Speech synthesis
PSQM
Intelligibility (communication)
Linear predictive coding
computer.software_genre
Speech processing
Speech enhancement
Computational Mathematics
Computer Science (miscellaneous)
Electrical and Electronic Engineering
computer
Subjects
Details
- ISSN :
- 23299304 and 23299290
- Volume :
- 22
- Database :
- OpenAIRE
- Journal :
- IEEE/ACM Transactions on Audio, Speech, and Language Processing
- Accession number :
- edsair.doi...........d24dae59535bd999fae4b0265fc254c2
- Full Text :
- https://doi.org/10.1109/taslp.2014.2361022