1. The impact of non-target events in synthetic soundscapes for sound event detection
- Author
-
Ronchini, Francesca, Serizel, Romain, Turpault, Nicolas, Cornell, Samuele, Ronchini, Francesca, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Departement of Electrical Engineering-SCD [Leuven] (ESAT-SCD), Catholic University of Leuven - Katholieke Universiteit Leuven (KU Leuven), Università Politecnica delle Marche [Ancona] (UNIVPM), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
FOS: Computer and information sciences ,Computer Science - Machine Learning ,Sound (cs.SD) ,open-source datasets ,deep learning ,[INFO.INFO-LG] Computer Science [cs]/Machine Learning [cs.LG] ,Computer Science - Sound ,[INFO.INFO-SD] Computer Science [cs]/Sound [cs.SD] ,Machine Learning (cs.LG) ,Sound event detection ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,Audio and Speech Processing (eess.AS) ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Audio and Speech Processing ,synthetic soundscapes - Abstract
International audience; Detection and Classification Acoustic Scene and Events Challenge 2021 Task 4 uses a heterogeneous dataset that includes both recorded and synthetic soundscapes. Until recently only target sound events were considered when synthesizing the soundscapes. However, recorded soundscapes often contain a substantial amount of non-target events that may affect the performance. In this paper, we focus on the impact of these non-target events in the synthetic soundscapes. Firstly, we investigate to what extent using non-target events alternatively during the training or validation phase (or none of them) helps the system to correctly detect target events. Secondly, we analyze to what extend adjusting the signal-to-noise ratio between target and non-target events at training improves the sound event detection performance. The results show that using both target and non-target events for only one of the phases (validation or training) helps the system to properly detect sound events, outperforming the baseline (which uses non-target events in both phases). The paper also reports the results of a preliminary study on evaluating the system on clips that contain only non-target events. This opens questions for future work on non-target subset and acoustic similarity between target and non-target events which might confuse the system.
- Published
- 2021