Back to Search Start Over

What's All the FUSS About Free Universal Sound Separation Data?

Authors :
Romain Serizel
Prem Seetharaman
Justin Salamon
Daniel P. W. Ellis
John R. Hershey
Scott Wisdom
Eduardo Fonseca
Nicolas Turpault
Hakan Erdogan
Google Inc
Research at Google
Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH)
Inria Nancy - Grand Est
Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD)
Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
Universitat Pompeu Fabra [Barcelona] (UPF)
Adobe Research
Descript, Inc.
ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
Source :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414774⟩
Publication Year :
2020
Publisher :
arXiv, 2020.

Abstract

International audience; We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate impulse responses of box shaped rooms with frequency-dependent reflective walls. Additional open-source data augmentation tools are also provided to produce new mixtures with different combinations of sources and room simulations. Finally, we introduce an open-source baseline separation model, based on an improved time-domain convolutional network (TDCN++), that can separate a variable number of sources in a mixture. This model achieves 9.8 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source inputs with 35.5 dB absolute SI-SNR. We hope this dataset will lower the barrier to new research and allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.

Details

Database :
OpenAIRE
Journal :
ICASSP, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, ICASSP 2021-46th International Conference on Acoustics, Speech, and Signal Processing, Jun 2021, Toronto/Virtual, Canada. ⟨10.1109/ICASSP39728.2021.9414774⟩
Accession number :
edsair.doi.dedup.....6c5c6ecdfe75c9b15c8478ab28ba2999
Full Text :
https://doi.org/10.48550/arxiv.2011.00803