6 results on '"Romain Serizel"'
Search Results
2. Performance Above All? Energy Consumption vs. Performance, a Study on Sound Event Detection with Heterogeneous Data
- Author
-
Romain Serizel, Samuele Cornell, and Nicolas Turpault
- Published
- 2023
- Full Text
- View/download PDF
3. Multichannel Speech Separation with Recurrent Neural Networks from High-Order Ambisonics Recordings
- Author
-
Emmanuel Vincent, Alexandre Guerin, Romain Serizel, Laureline Perotin, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Orange Labs [Cesson-Sévigné], Orange Labs, and Perotin, Lauréline
- Subjects
Artificial neural network ,Spatial filter ,[INFO.INFO-TS] Computer Science [cs]/Signal and Image Processing ,Computer science ,Ambisonics ,Speech recognition ,Word error rate ,020206 networking & telecommunications ,02 engineering and technology ,Harmonic analysis ,030507 speech-language pathology & audiology ,03 medical and health sciences ,Recurrent neural network ,high-order ambisonics (HOA) ,[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing ,0202 electrical engineering, electronic engineering, information engineering ,Source separation ,multichannel filtering ,LSTM ,0305 other medical science ,Speech separation ,Communication channel - Abstract
International audience; We present a source separation system for high-order ambisonics (HOA) contents. We derive a multichannel spatial filter from a mask estimated by a long short-term memory (LSTM) recurrent neural network. We combine one channel of the mixture with the outputs of basic HOA beamformers as inputs to the LSTM, assuming that we know the directions of arrival of the directional sources. In our experiments, the speech of interest can be corrupted either by diffuse noise or by an equally loud competing speaker. We show that adding as input the output of the beamformer steered toward the competing speech in addition to that of the beamformer steered toward the target speech brings significant improvements in terms of word error rate.
- Published
- 2018
- Full Text
- View/download PDF
4. Leveraging deep neural networks with nonnegative representations for improved environmental sound classification
- Author
-
Gael Richard, Slim Essid, Romain Serizel, Victor Bisot, Département Traitement du Signal et des Images (TSI), Télécom ParisTech-Centre National de la Recherche Scientifique (CNRS), Laboratoire Traitement et Communication de l'Information (LTCI), Institut Mines-Télécom [Paris] (IMT)-Télécom Paris, Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Centre National de la Recherche Scientifique (CNRS)-Télécom ParisTech, Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)
- Subjects
Computer Science::Machine Learning ,Computer science ,Complex system ,02 engineering and technology ,Machine learning ,computer.software_genre ,Non-negative matrix factorization ,030507 speech-language pathology & audiology ,03 medical and health sciences ,[INFO.INFO-LG]Computer Science [cs]/Machine Learning [cs.LG] ,0202 electrical engineering, electronic engineering, information engineering ,Deep Neural Networks ,Artificial neural network ,business.industry ,Deep learning ,020206 networking & telecommunications ,Pattern recognition ,Nonnegative Matrix Factorization ,Time–frequency analysis ,ComputingMethodologies_PATTERNRECOGNITION ,Computer Science::Sound ,[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD] ,Sound Classification ,Deep neural networks ,Spectrogram ,Artificial intelligence ,0305 other medical science ,business ,Feature learning ,computer - Abstract
International audience; This paper introduces the use of representations based on non-negative matrix factorization (NMF) to train deep neural networks with applications to environmental sound classification. Deep learning systems for sound classification usually rely on the network to learn meaningful representations from spectrograms or hand-crafted features. Instead, we introduce a NMF-based feature learning stage before training deep networks , whose usefulness is highlighted in this paper, especially for multi-source acoustic environments such as sound scenes. We rely on two established unsupervised and supervised NMF techniques to learn better input representations for deep neural networks. This will allow us, with simple architectures, to reach competitive performance with more complex systems such as convolutional networks for acoustic scene classification. The proposed systems outperform neu-ral networks trained on time-frequency representations on two acoustic scene classification datasets as well as the best systems from the 2016 DCASE challenge.
- Published
- 2017
- Full Text
- View/download PDF
5. Supervised group nonnegative matrix factorisation with similarity constraints and applications to speaker identification
- Author
-
Romain Serizel, Gael Richard, Slim Essid, Victor Bisot, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS), Laboratoire Traitement et Communication de l'Information (LTCI), and Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
feature learning ,Computer Science::Machine Learning ,Similarity (geometry) ,online learning ,Feature extraction ,02 engineering and technology ,Semi-supervised learning ,Machine learning ,computer.software_genre ,030507 speech-language pathology & audiology ,03 medical and health sciences ,0202 electrical engineering, electronic engineering, information engineering ,speaker identificati ,Nonnegative matrix factorisation ,Mathematics ,Group (mathematics) ,business.industry ,020206 networking & telecommunications ,Pattern recognition ,Euclidean distance ,Speaker diarisation ,Artificial intelligence ,Mel-frequency cepstrum ,dictionary learning ,0305 other medical science ,business ,[SPI.SIGNAL]Engineering Sciences [physics]/Signal and Image processing ,Feature learning ,computer - Abstract
International audience; This paper presents supervised feature learning approaches for speaker identification that rely on nonnegative matrix factorisa-tion. Recent studies have shown that group nonnegative matrix factorisation and task-driven supervised dictionary learning can help performing effective feature learning for audio classification problems. This paper proposes to integrate a recent method that relies on group nonnegative matrix factorisation into a task-driven supervised framework for speaker identification. The goal is to capture both the speaker variability and the session variability while exploiting the discriminative learning aspect of the task-driven approach. Results on a subset of the ESTER corpus prove that the proposed approach can be competitive with I-vectors. Index Terms— Nonnegative matrix factorisation, feature learning , dictionary learning, online learning, speaker identification
- Published
- 2017
- Full Text
- View/download PDF
6. Mini-batch stochastic approaches for accelerated multiplicative updates in nonnegative matrix factorisation with beta-divergence
- Author
-
Gael Richard, Romain Serizel, Slim Essid, Speech Modeling for Facilitating Oral-Based Communication (MULTISPEECH), Inria Nancy - Grand Est, Institut National de Recherche en Informatique et en Automatique (Inria)-Institut National de Recherche en Informatique et en Automatique (Inria)-Department of Natural Language Processing & Knowledge Discovery (LORIA - NLPKD), Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Institut National de Recherche en Informatique et en Automatique (Inria)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL), Laboratoire Traitement et Communication de l'Information (LTCI), Télécom ParisTech-Institut Mines-Télécom [Paris] (IMT)-Centre National de la Recherche Scientifique (CNRS), Département Traitement du Signal et des Images (TSI), Télécom ParisTech-Centre National de la Recherche Scientifique (CNRS), Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Laboratoire Lorrain de Recherche en Informatique et ses Applications (LORIA), and Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)-Université de Lorraine (UL)-Centre National de la Recherche Scientifique (CNRS)
- Subjects
Mathematical optimization ,Multiplicative function ,MathematicsofComputing_NUMERICALANALYSIS ,020206 networking & telecommunications ,02 engineering and technology ,Matrix decomposition ,Non-negative matrix factorization ,Euclidean distance ,030507 speech-language pathology & audiology ,03 medical and health sciences ,[STAT.ML]Statistics [stat]/Machine Learning [stat.ML] ,Convergence (routing) ,0202 electrical engineering, electronic engineering, information engineering ,Time series ,0305 other medical science ,Coordinate descent ,Divergence (statistics) ,Mathematics - Abstract
International audience; Nonnegative matrix factorisation (NMF) with β-divergence is a popular method to decompose real world data. In this paper we propose mini-batch stochastic algorithms to perform NMF efficiently on large data matrices. Besides the stochastic aspect, the mini-batch approach allows exploiting intensive computing devices such as general purpose graphical processing units to decrease the processing time and in some cases outperform coordinate descent approach.
- Published
- 2016
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.