1. Improving Sound Event Detection In Domestic Environments Using Sound Separation
- Author
-
Turpault, Nicolas, Wisdom, Scott, Erdogan, Hakan, Hershey, John, Serizel, Romain, Fonseca, Eduardo, Seetharaman, Prem, Salamon, Justin, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Google Inc., Music Technology Group (MTG), Universitat Pompeu Fabra [Barcelona] (UPF), Northwestern University [Evanston], Adobe Research, We would like to thank the other organizers of DCASE 2020 task 4: Daniel P. W. Ellis and Ankit Parag Shah., Grid'5000, ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018), Universitat Pompeu Fabra [Barcelona], and ANR-18-CE23-0020,LEAUDS,LEARNING TO UNDERSTAND AUDIO SCENES(2018)
- Subjects
Signal Processing (eess.SP) ,FOS: Computer and information sciences ,Sound (cs.SD) ,Computer Science - Sound ,[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI] ,Synthetic soundscapes ,Sound event detection ,Audio and Speech Processing (eess.AS) ,TheoryofComputation_LOGICSANDMEANINGSOFPROGRAMS ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,FOS: Electrical engineering, electronic engineering, information engineering ,Electrical Engineering and Systems Science - Signal Processing ,Sound separation ,Index Terms-Sound event detection ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; Performing sound event detection on real-world recordings often implies dealing with overlapping target sound events and non-target sounds, also referred to as interference or noise. Until now these problems were mainly tackled at the classifier level. We propose to use sound separation as a pre-processing for sound event detection. In this paper we start from a sound separation model trained on the Free Universal Sound Separation dataset and the DCASE 2020 task 4 sound event detection baseline. We explore different methods to combine separated sound sources and the original mixture within the sound event detection. Furthermore, we investigate the impact of adapting the sound separation model to the sound event detection data on both the sound separation and the sound event detection.
- Published
- 2020