Back to Search Start Over

Foreground-Background Ambient Sound Scene Separation

Authors :
Michel Olvera
Romain Serizel
Emmanuel Vincent
Gilles Gasso
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Laboratoire d'Informatique, de Traitement de l'Information et des Systèmes (LITIS)
Université Le Havre Normandie (ULH)
Normandie Université (NU)-Normandie Université (NU)-Université de Rouen Normandie (UNIROUEN)
Normandie Université (NU)-Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie)
Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)
This work was made with the support of the French National Research Agency, in the framework of the project LEAUDS 'Learning to understandaudio scenes' (ANR-18-CE23-0020). Experiments presented in this paper were carried out using the Grid’5000 testbed, supported by a scientificinterest group hosted by Inria and including CNRS, RENATER and several Universities as well as other organizations (see https://www.grid5000.fr).
GRID5000
ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
Institut national des sciences appliquées Rouen Normandie (INSA Rouen Normandie)
Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Normandie Université (NU)-Institut National des Sciences Appliquées (INSA)-Université de Rouen Normandie (UNIROUEN)
Normandie Université (NU)-Université Le Havre Normandie (ULH)
Normandie Université (NU)
ANR-18-CE23-0020,LEAUDS,LEARNING TO UNDERSTAND AUDIO SCENES(2018)
Source :
EUSIPCO 2020-28th European Signal Processing Conference, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands. ⟨10.23919/Eusipco47968.2020.9287436⟩, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands, EUSIPCO
Publication Year :
2020

Abstract

International audience; Ambient sound scenes typically comprise multiple short events occurring on top of a somewhat stationary background. We consider the task of separating these events from the background, which we call foreground-background ambient sound scene separation. We propose a deep learning-based separation framework with a suitable feature normaliza-tion scheme and an optional auxiliary network capturing the background statistics, and we investigate its ability to handle the great variety of sound classes encountered in ambient sound scenes, which have often not been seen in training. To do so, we create single-channel foreground-background mixtures using isolated sounds from the DESED and Audioset datasets, and we conduct extensive experiments with mixtures of seen or unseen sound classes at various signal-to-noise ratios. Our experimental findings demonstrate the generalization ability of the proposed approach.

Details

Language :
English
Database :
OpenAIRE
Journal :
EUSIPCO 2020-28th European Signal Processing Conference, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands. ⟨10.23919/Eusipco47968.2020.9287436⟩, EUSIPCO 2020-28th European Signal Processing Conference, Jan 2021, Amsterdam / Virtual, Netherlands, EUSIPCO
Accession number :
edsair.doi.dedup.....983935e6623da811ba9e937e331b90cf