1. What's All the FUSS About Free Universal Sound Separation Data?
- Author
-
Romain Serizel, Prem Seetharaman, Justin Salamon, Daniel P. W. Ellis, John R. Hershey, Scott Wisdom, Eduardo Fonseca, Nicolas Turpault, Hakan Erdogan, Google Inc, Research at Google, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Universitat Pompeu Fabra [Barcelona] (UPF), Adobe Research, Descript, Inc., and ANR-18-CE23-0020,LEAUDS,Apprentissage statistique pour la compréhension de scènes audio(2018)
- Subjects
FOS: Computer and information sciences ,Reverberation ,Sound (cs.SD) ,open-source datasets ,Computer science ,Sound separation ,Separation (aeronautics) ,02 engineering and technology ,Impulse (physics) ,Computer Science - Sound ,Data modeling ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,Audio and Speech Processing (eess.AS) ,0202 electrical engineering, electronic engineering, information engineering ,Open domain ,FOS: Electrical engineering, electronic engineering, information engineering ,business.industry ,Deep learning ,deep learning ,020206 networking & telecommunications ,Universal sound separation ,variable source sep- aration ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,020201 artificial intelligence & image processing ,Artificial intelligence ,Variable number ,business ,Algorithm ,Electrical Engineering and Systems Science - Audio and Speech Processing - Abstract
International audience; We introduce the Free Universal Sound Separation (FUSS) dataset, a new corpus for experiments in separating mixtures of an unknown number of sounds from an open domain of sound types. The dataset consists of 23 hours of single-source audio data drawn from 357 classes, which are used to create mixtures of one to four sources. To simulate reverberation, an acoustic room simulator is used to generate impulse responses of box shaped rooms with frequency-dependent reflective walls. Additional open-source data augmentation tools are also provided to produce new mixtures with different combinations of sources and room simulations. Finally, we introduce an open-source baseline separation model, based on an improved time-domain convolutional network (TDCN++), that can separate a variable number of sources in a mixture. This model achieves 9.8 dB of scale-invariant signal-to-noise ratio improvement (SI-SNRi) on mixtures with two to four sources, while reconstructing single-source inputs with 35.5 dB absolute SI-SNR. We hope this dataset will lower the barrier to new research and allow for fast iteration and application of novel techniques from other machine learning domains to the sound separation challenge.
- Published
- 2020
- Full Text
- View/download PDF