Back to Search
Start Over
Joint speech and overlap detection: a benchmark over multiple audio setup and speech domains
- Source :
- Le Mans Université. 2023
- Publication Year :
- 2023
- Publisher :
- HAL CCSD, 2023.
-
Abstract
- Voice activity and overlapped speech detection (respectively VAD and OSD) are key pre-processing tasks for speaker diarization. The final segmentation performance highly relies on the robustness of these sub-tasks. Recent studies have shown VAD and OSD can be trained jointly using a multi-class classification model. However, these works are often restricted to a specific speech domain, lacking information about the generalization capacities of the systems. This paper proposes a complete and new benchmark of different VAD and OSD models, on multiple audio setups (single/multi-channel) and speech domains (e.g. media, meeting...). Our 2/3-class systems, which combine a Temporal Convolutional Network with speech representations adapted to the setup, outperform state-of-the-art results. We show that the joint training of these two tasks offers similar performances in terms of F1-score to two dedicated VAD and OSD systems while reducing the training cost. This unique architecture can also be used for single and multichannel speech processing.
- Subjects :
- overlap speech detection
[INFO.INFO-TS]Computer Science [cs]/Signal and Image Processing
Speech segmentation
[INFO.INFO-SD]Computer Science [cs]/Sound [cs.SD]
speech activity detection
multi-channel
[INFO.INFO-NE]Computer Science [cs]/Neural and Evolutionary Computing [cs.NE]
[INFO.INFO-AI]Computer Science [cs]/Artificial Intelligence [cs.AI]
Subjects
Details
- Language :
- English
- Database :
- OpenAIRE
- Journal :
- Le Mans Université. 2023
- Accession number :
- edsair.od......3379..dd70d942e29f3d5c3fe89d1d62934779