Author: "Lebourdais, Martin" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Lebourdais, Martin"' showing total 9 results

Start Over Author "Lebourdais, Martin"

9 results on '"Lebourdais, Martin"'

1. TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024

Author: Kalda, Joonas, Alumäe, Tanel, Lebourdais, Martin, Bredin, Hervé, Baroudi, Séverin, and Marxer, Ricard
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This paper describes the submissions of team TalTech-IRIT-LIS to the DISPLACE 2024 challenge. Our team participated in the speaker diarization and language diarization tracks of the challenge. In the speaker diarization track, our best submission was an ensemble of systems based on the pyannote.audio speaker diarization pipeline utilizing powerset training and our recently proposed PixIT method that performs joint diarization and speech separation. We improve upon PixIT by using the separation outputs for speaker embedding extraction. Our ensemble achieved a diarization error rate of 27.1% on the evaluation dataset. In the language diarization track, we fine-tuned a pre-trained Wav2Vec2-BERT language embedding model on in-domain data, and clustered short segments using AHC and VBx, based on similarity scores from LDA/PLDA. This led to a language diarization error rate of 27.6% on the evaluation data. Both results were ranked first in their respective challenge tracks., Comment: accepted at Interspeech 2024
Published: 2024

2. Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

Author: Lebourdais, Martin, Mariotte, Théo, Almudévar, Antonio, Tahon, Marie, and Ortega, Alfonso
Subjects: Electrical Engineering and Systems Science - Audio and Speech Processing, Computer Science - Artificial Intelligence, Computer Science - Sound
Abstract: Audio segmentation is a key task for many speech technologies, most of which are based on neural networks, usually considered as black boxes, with high-level performances. However, in many domains, among which health or forensics, there is not only a need for good performance but also for explanations about the output decision. Explanations derived directly from latent representations need to satisfy "good" properties, such as informativeness, compactness, or modularity, to be interpretable. In this article, we propose an explainable-by-design audio segmentation model based on non-negative matrix factorization (NMF) which is a good candidate for the design of interpretable representations. This paper shows that our model reaches good segmentation performances, and presents deep analyses of the latent representation extracted from the non-negative matrix. The proposed approach opens new perspectives toward the evaluation of interpretable representations according to "good" properties., Comment: Accepted at Interspeech 2024, 5 pages, 2 figures, 3 tables
Published: 2024

3. Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Author: Lebourdais, Martin, Mariotte, Théo, Tahon, Marie, Larcher, Anthony, Laurent, Antoine, Montresor, Silvio, Meignier, Sylvain, and Thomas, Jean-Hugh
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Computer Science - Neural and Evolutionary Computing, Electrical Engineering and Systems Science - Audio and Speech Processing, Electrical Engineering and Systems Science - Signal Processing
Abstract: Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.
Published: 2023

4. Overlapped speech and gender detection with WavLM pre-trained features

Author: Lebourdais, Martin, Tahon, Marie, Laurent, Antoine, and Meignier, Sylvain
Subjects: Computer Science - Sound, Computer Science - Artificial Intelligence, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This article focuses on overlapped speech and gender detection in order to study interactions between women and men in French audiovisual media (Gender Equality Monitoring project). In this application context, we need to automatically segment the speech signal according to speakers gender, and to identify when at least two speakers speak at the same time. We propose to use WavLM model which has the advantage of being pre-trained on a huge amount of speech data, to build an overlapped speech detection (OSD) and a gender detection (GD) systems. In this study, we use two different corpora. The DIHARD III corpus which is well adapted for the OSD task but lack gender information. The ALLIES corpus fits with the project application context. Our best OSD system is a Temporal Convolutional Network (TCN) with WavLM pre-trained features as input, which reaches a new state-of-the-art F1-score performance on DIHARD. A neural GD is trained with WavLM inputs on a gender balanced subset of the French broadcast news ALLIES data, and obtains an accuracy of 97.9%. This work opens new perspectives for human science researchers regarding the differences of representation between women and men in French media., Comment: Submitted and accepted to Interspeech 2022
Published: 2022

5. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

Author: Macary, Manon, Lebourdais, Martin, Tahon, Marie, Estève, Yannick, Rousseau, Anthony, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Woeginger, Gerhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Karpov, Alexey, editor, and Potapova, Rodmonga, editor
Published: 2020
Full Text: View/download PDF

6. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

Author: Macary, Manon, primary, Lebourdais, Martin, additional, Tahon, Marie, additional, Estève, Yannick, additional, and Rousseau, Anthony, additional
Published: 2020
Full Text: View/download PDF

7. Overlapped speech and gender detection with WavLM pre-trained features

Author: Lebourdais, Martin, primary, Tahon, Marie, additional, LAURENT, Antoine, additional, and Meignier, Sylvain, additional
Published: 2022
Full Text: View/download PDF

8. Parole superposée et genre, étude des annotations pour les médias audiovisuels.

Author: Lebourdais, Martin, primary, Tahon, Marie, additional, Laurent, Antoine, additional, Larcher, Anthony, additional, and Meignier, Sylvain, additional
Published: 2022
Full Text: View/download PDF

9. Overlapped speech and gender detection with WavLM pre-trained features

Author: Martin Lebourdais, Marie Tahon, Antoine Laurent, Sylvain Meignier, Lebourdais, Martin, and Mesure de l'égalité entre les sexes dans les médias - - GEM2019 - ANR-19-CE38-0012 - AAPG2019 - VALID
Subjects: FOS: Computer and information sciences, [INFO.INFO-AI] Computer Science [cs]/Artificial Intelligence [cs.AI], Sound (cs.SD), Artificial Intelligence (cs.AI), Audio and Speech Processing (eess.AS), Computer Science - Artificial Intelligence, speech, FOS: Electrical engineering, electronic engineering, information engineering, gender, overlapped speech detection, Computer Science - Sound, Electrical Engineering and Systems Science - Audio and Speech Processing
Abstract: This article focuses on overlapped speech and gender detection in order to study interactions between women and men in French audiovisual media (Gender Equality Monitoring project). In this application context, we need to automatically segment the speech signal according to speakers gender, and to identify when at least two speakers speak at the same time. We propose to use WavLM model which has the advantage of being pre-trained on a huge amount of speech data, to build an overlapped speech detection (OSD) and a gender detection (GD) systems. In this study, we use two different corpora. The DIHARD III corpus which is well adapted for the OSD task but lack gender information. The ALLIES corpus fits with the project application context. Our best OSD system is a Temporal Convolutional Network (TCN) with WavLM pre-trained features as input, which reaches a new state-of-the-art F1-score performance on DIHARD. A neural GD is trained with WavLM inputs on a gender balanced subset of the French broadcast news ALLIES data, and obtains an accuracy of 97.9%. This work opens new perspectives for human science researchers regarding the differences of representation between women and men in French media., Submitted and accepted to Interspeech 2022
Published: 2022

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

9 results on '"Lebourdais, Martin"'

1. TalTech-IRIT-LIS Speaker and Language Diarization Systems for DISPLACE 2024

2. Explainable by-design Audio Segmentation through Non-Negative Matrix Factorization and Probing

3. Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

4. Overlapped speech and gender detection with WavLM pre-trained features

5. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

6. Multi-corpus Experiment on Continuous Speech Emotion Recognition: Convolution or Recurrence?

7. Overlapped speech and gender detection with WavLM pre-trained features

8. Parole superposée et genre, étude des annotations pour les médias audiovisuels.

9. Overlapped speech and gender detection with WavLM pre-trained features

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

9 results on '"Lebourdais, Martin"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources