Speech emotion recognition based on formant characteristics feature extraction and phoneme type convergence.

Authors :: Liu, Zhen-Tao
Rehman, Abdul
Wu, Min
Cao, Wei-Hua
Hao, Man
Source :: Information Sciences. Jul2021, Vol. 563, p309-325. 17p.
Publication Year :: 2021
Abstract: [Display omitted] • Emotions are recognized using the occurrences of auto-detected phonological units. • Phonemes are clustered together based on the similarity of formant characteristics. • Experiment results indicate reduced computational cost and increased robustness. Speech Emotion Recognition (SER) has numerous applications including human-robot interaction, online gaming, and health care assistance. While deep learning-based approaches achieve considerable precision, they often come with high computational and time costs. Indeed, feature learning strategies must search for important features in a large amount of speech data. In order to reduce these time and computational costs, we propose pre-processing step in which speech segments with similar formant characteristics are clustered together and labeled as the same phoneme. The phoneme occurrence rates in emotional utterances are then used as the input features for classifiers. Using six databases (EmoDB, RAVDESS, IEMOCAP, ShEMO, DEMoS and MSP-Improv) for evaluation, the level of accuracy is comparable to that of current state-of-the-art methods and the required training time was significantly reduced from hours to minutes. [ABSTRACT FROM AUTHOR]

Subjects :: *EMOTION recognition
*PHONEME (Linguistics)
*FEATURE extraction
*HUMAN-robot interaction
*VIDEO games
*EMOTIONS
*SPEECH perception

Full Text Access

Tools