Back to Search Start Over

Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains

Authors :
Lebourdais, Martin
Mariotte, Théo
Tahon, Marie
Larcher, Anthony
Laurent, Antoine
Montresor, Silvio
Meignier, Sylvain
Thomas, Jean-Hugh
Laboratoire d'Informatique de l'Université du Mans (LIUM)
Le Mans Université (UM)
Laboratoire d'Acoustique de l'Université du Mans (LAUM)
Le Mans Université (UM)-Centre National de la Recherche Scientifique (CNRS)
This work was performed using HPC resources from GENCI–IDRIS (Grant 2022-AD011012565)
French ANR GEM (ANR-19-CE38-0012)
LMAC grant from Région Pays de la Loire.
Le Mans Université
ANR-19-CE38-0012,GEM,Mesure de l'égalité entre les sexes dans les médias(2019)
European Project: 101007666,Exchanges for SPEech ReseArch aNd TechnOlogies
Source :
Le Mans Université. 2023
Publication Year :
2023
Publisher :
HAL CCSD, 2023.

Abstract

Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.

Details

Language :
English
Database :
OpenAIRE
Journal :
Le Mans Université. 2023
Accession number :
edsair.od......3379..dd70d942e29f3d5c3fe89d1d62934779