Back to Search Start Over

Sound Event Detection and Separation: a Benchmark on Desed Synthetic Soundscapes

Authors :
Romain Serizel
Hakan Erdogan
Justin Salamon
Nicolas Turpault
John R. Hershey
Scott Wisdom
Eduardo Fonseca
Prem Seetharaman
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Google Inc
Research at Google
Universitat Pompeu Fabra [Barcelona] (UPF)
Descript, Inc.
Adobe Research
Part of this work was made with the support of the French National Research Agency, in the framework of the project LEAUDS 'Learning to understand audio scenes' (ANR-18-CE23-0020) and the French region Grand-Est. High Performance Computing resources were partially provided by the EXPLOR centre hosted by the University de Lorraine.
Grid'5000
ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Source :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414789⟩
Publication Year :
2020
Publisher :
HAL CCSD, 2020.

Abstract

International audience; We propose a benchmark of state-of-the-art sound event detection systems (SED). We designed synthetic evaluation sets to focus on specific sound event detection challenges. We analyze the performance of the submissions to DCASE 2021 task 4 depending on time related modifications (time position of an event and length of clips) and we study the impact of non-target sound events and reverberation. We show that the localization in time of sound events is still a problem for SED systems. We also show that reverberation and non-target sound events are severely degrading the performance of the SED systems. In the latter case, sound separation seems like a promising solution.

Details

Language :
English
Database :
OpenAIRE
Journal :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414789⟩
Accession number :
edsair.doi.dedup.....fd9b83c1e445912a1a4566dae921c839
Full Text :
https://doi.org/10.1109/ICASSP39728.2021.9414789⟩