1. Effective MLP and CNN based ensemble learning for speech emotion recognition.
- Author
-
Middya, Asif Iqbal, Nag, Baibhav, and Roy, Sarbani
- Subjects
CONVOLUTIONAL neural networks ,EMOTION recognition ,SUPPORT vector machines ,DEEP learning ,SPEECH - Abstract
Speech emotion recognition (SER) is one of the most important and active areas of. research in speech processing. Numerous approaches have been proposed to address various limitations in this field, but the sheer diversity of speech emotions, as well as their complexity, continue to make SER a tough nut to crack. This paper attempts to conduct a thorough investigation into speech emotion recognition in order to determine the most appropriate feature set and model for SER. A multi-layer perceptron (MLP) and convolutional neural network (CNN) based ensemble model for SER is proposed, which is a simple yet very powerful model for SER that can greatly improve classification accuracy. The model's performance is evaluated based on four benchmark datasets, namely RAVDESS (Ryerson Audio-Visual Database of Emotional Speech and Song), EmoDB (Emotional Dat0abase), SAVEE (Surrey Audio-Visual Expressed Emotion), and TESS (Toronto Emotional Speech Set). The proposed model dominates over several baseline methods (decision tree (DT), random forest (RF), support vector machine (SVM), k-nearest neighbour (KNN), and the base learners, i.e., MLP and CNN) in terms of various performance metrics for all the datasets. Furthermore, the proposed model outperforms all previous works for RAVDESS (Acc=73.1%), SAVEE (Acc=83.8%), and TESS (Acc=99.9%) datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF