Back to Search Start Over

Improving the Learning Power of Artificial Intelligence Using Multimodal Deep Learning.

Authors :
Nadykto, A.
Aleksic, N.
Lima, P.
Pivkin, P.
Uvarova, L.
Jiang, X.
Zelensky, A.
Shchetinin, Eugene Yu.
Sevastianov, Leonid
Source :
EPJ Web of Conferences. 4/26/2021, Vol. 248, p1-4. 4p.
Publication Year :
2021

Abstract

Computer paralinguistic analysis is widely used in security systems, biometric research, call centers and banks. Paralinguistic models estimate different physical properties of voice, such as pitch, intensity, formants and harmonics to classify emotions. The main goal is to find such features that would be robust to outliers and will retain variety of human voice properties at the same time. Moreover, the model used must be able to estimate features on a time scale for an effective analysis of voice variability. In this paper a paralinguistic model based on Bidirectional Long Short-Term Memory (BLSTM) neural network is described, which was trained for vocal-based emotion recognition. The main advantage of this network architecture is that each module of the network consists of several interconnected layers, providing the ability to recognize flexible long-term dependencies in data, which is important in context of vocal analysis. We explain the architecture of a bidirectional neural network model, its main advantages over regular neural networks and compare experimental results of BLSTM network with other models. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
21016275
Volume :
248
Database :
Academic Search Index
Journal :
EPJ Web of Conferences
Publication Type :
Conference
Accession number :
150848352
Full Text :
https://doi.org/10.1051/epjconf/202124801017