Back to Search Start Over

Spectral and cepstral audio noise reduction techniques in speech emotion recognition

Authors :
Björn Schuller
Jouni Pohjalainen
Zixing Zhang
Fabien Ringeval
Chair of Complex and Intelligent Systems (CIS)
Universität Passau [Passau]
Groupe d’Étude en Traduction Automatique/Traitement Automatisé des Langues et de la Parole (GETALP )
Laboratoire d'Informatique de Grenoble (LIG )
Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])-Institut polytechnique de Grenoble - Grenoble Institute of Technology (Grenoble INP )-Centre National de la Recherche Scientifique (CNRS)-Université Grenoble Alpes [2016-2019] (UGA [2016-2019])
Department of Computing [London]
Biomedical Image Analysis Group [London] (BioMedIA)
Imperial College London-Imperial College London
This work has been supported by the European Community’s Seventh Framework Programme through the ERC Starting Grant No. 338164 (iHEARu).
European Project: 338164,EC:FP7:ERC,ERC-2013-StG,IHEARU(2014)
Source :
ACM Multimedia, Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), 2016, Amsterdam, Netherlands. pp.670-674, ⟨10.1145/2964284.2967306⟩
Publication Year :
2020

Abstract

International audience; Signal noise reduction can improve the performance of machine learning systems dealing with time signals such as audio. Real-life applicability of these recognition technologies requires the system to uphold its performance level in variable, challenging conditions such as noisy environments. In this contribution, we investigate audio signal denoising methods in cepstral and log-spectral domains and compare them with common implementations of standard techniques. The different approaches are first compared generally using averaged acoustic distance metrics. They are then applied to automatic recognition of spontaneous and natural emotions under simulated smartphone-recorded noisy conditions. Emotion recognition is implemented as support vector regression for continuous-valued prediction of arousal and valence on a realistic multimodal database. In the experiments, the proposed methods are found to generally outperform standard noise reduction algorithms.

Details

Language :
English
Database :
OpenAIRE
Journal :
ACM Multimedia, Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), Proceedings of the 24th ACM International Conference on Multimedia (ACM MM), 2016, Amsterdam, Netherlands. pp.670-674, ⟨10.1145/2964284.2967306⟩
Accession number :
edsair.doi.dedup.....b2d955764d20bbbf3b37c56b58f0e5ae
Full Text :
https://doi.org/10.1145/2964284.2967306⟩