Author: "Mariotte, Théo" / Topic: [info.info-ai]computer science [cs]/artificial intelligence [cs.ai] - Searchworks@Jio Institute Digital Library Search Results

1. Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Author: Lebourdais, Martin, Mariotte, Théo, Tahon, Marie, Larcher, Anthony, Laurent, Antoine, Montresor, Silvio, Meignier, Sylvain, Thomas, Jean-Hugh, Laboratoire d'Informatique de l'Université du Mans (LIUM), Le Mans Université (UM), Laboratoire d'Acoustique de l'Université du Mans (LAUM), Le Mans Université (UM)-Centre National de la Recherche Scientifique (CNRS), This work was performed using HPC resources from GENCI–IDRIS (Grant 2022-AD011012565), French ANR GEM (ANR-19-CE38-0012), LMAC grant from Région Pays de la Loire., Le Mans Université, ANR-19-CE38-0012,GEM,Mesure de l'égalité entre les sexes dans les médias(2019), and European Project: 101007666,Exchanges for SPEech ReseArch aNd TechnOlogies
Subjects: overlap speech detection, [INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing, Speech segmentation, [INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD], speech activity detection, multi-channel, [INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE], [INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Abstract: Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.
Published: 2023

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Refine your results

1 results on '"Mariotte, Théo"'

1. Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Catalog

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Publication Year Range

Language

Database

1 results on '"Mariotte, Théo"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources