Back to Search Start Over

Investigating automatic & human filled pause insertion for speech synthesis

Authors :
William Byrne
Rasmus Dall
Marcus Tomalin
Simon King
Mirjam Wester
Source :
INTERSPEECH, Dall, R, Tomalin, M, Wester, M, Byrne, W & King, S 2014, Investigating Automatic & Human Filled Pause Insertion for Speech Synthesis . in Proc. Interspeech .
Publication Year :
2014
Publisher :
ISCA, 2014.

Abstract

Filled pauses are pervasive in conversational speech and have been shown to serve several psychological and structural purposes. Despite this, they are seldom modelled overtly by stateof-the-art speech synthesis systems. This paper seeks to motivate the incorporation of filled pauses into speech synthesis systems by exploring their use in conversational speech, and by comparing the performance of several automatic systems inserting filled pauses into fluent text. Two initial experiments are described which seek to determine whether people’s predicted insertion points are consistent with actual practice and/or with each other. The experiments also investigate whether there are ‘right’ and ‘wrong’ places to insert filled pauses. The results show good consistency between people’s predictions of usage and their actual practice, as well as a perceptual preference for the ‘right’ placement. The third experiment contrasts the performance of several automatic systems that insert filled pauses into fluent sentences. The best performance (determined by F-score) was achieved through the by-word interpolation of probabilities predicted by Recurrent Neural Network and 4gram Language Models. The results offer insights into the use and perception of filled pauses by humans, and how automatic systems can be used to predict their locations. Index Terms: filled pause, HMM TTS, SVM, RNN

Details

Database :
OpenAIRE
Journal :
Interspeech 2014
Accession number :
edsair.doi.dedup.....2b83da553a5dfce6b44d7b5895a5abb2
Full Text :
https://doi.org/10.21437/interspeech.2014-11