Start Over

Speech emotion recognition using MFCC-based entropy feature.

Authors :: Mishra, Siba Prasad
Warule, Pankaj
Deb, Suman
Source :: Signal, Image & Video Processing; Feb2024, Vol. 18 Issue 1, p153-161, 9p
Publication Year :: 2024
Abstract: The prime objective of speech emotion recognition is to accurately recognize the emotion from the speech signal. It is a challenging task to accomplish. Speech emotion recognition (SER) has many applications, including medicine, online marketing, strengthening human–computer interaction (HCI), online education, and many more. Hence, it has been a topic of interest for many researchers for last three decades. The researchers used different methodologies to improve the classification accuracy of emotions. In this study, we tried to improve emotion classification accuracy using mel-frequency cepstral coefficient (MFCC)-based entropy features. First, we extracted the MFCC coefficient matrix from every speech of the EMO-DB, RAVDESS and SAVEE datasets, and then we calculated the proposed features: statistical mean ( MFCC mean ), MFCC-based approximate entropy ( MFCC AE ), and MFCC-based spectral entropy ( MFCC SE ), from the MFCC coefficient matrix of every utterance. The performance of the proposed features is accessed using the DNN classifier. We achieved a classification accuracy of 87.48%, 75.9%, and 79.64% using the combination of MFCC mean and MFCC SE features and obtained classification accuracies of 85.61%, 77.54%, and 76.26% using the combination of MFCC mean , MFCC AE , and MFCC SE features for the EMO-DB, RAVDESS, and SAVEE datasets, respectively. [ABSTRACT FROM AUTHOR]