Back to Search Start Over

Blueprint Separable Subsampling and Aggregate Feature Conformer-Based End-to-End Neural Diarization.

Authors :
Jiao, Xiaolin
Chen, Yaqi
Qu, Dan
Yang, Xukui
Source :
Electronics (2079-9292); Oct2023, Vol. 12 Issue 19, p4118, 14p
Publication Year :
2023

Abstract

At present, a prevalent approach to speaker diarization is clustering based on speaker embeddings. However, this method encounters two primary issues. Firstly, it cannot directly minimize the diarization error during the training process; secondly, the majority of clustering-based methods struggle to handle speaker overlap in audio. A viable approach for addressing these issues involves adopting end-to-end speaker diarization (EEND). Nevertheless, training this EEND system generally requires lengthy audio inputs, which must be downsampled to allow efficient model processing. In this study, we develop a novel downsampling layer using blueprint separable convolution (BSConv) instead of depthwise separable convolution (DSC) as the foundational convolutional unit, which effectively preserves information from the original audio. Furthermore, we incorporate multi-scale feature aggregation (MFA) into the encoder structure to combine the features extracted by each conformer block to the output layer, consequently enhancing the expressiveness of the model's feature extraction. Lastly, we employ the conformer as the backbone network to incorporate the proposed enhancements, resulting in an EEND system named BSAC-EEND. We assess our suggested methodology on both simulated and real datasets. The experiment indicates that our proposed EEND system reduces diarization error rate (DER) by an average of 17.3% for two-speaker datasets and 12.8% for three-speaker datasets compared to the baseline. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20799292
Volume :
12
Issue :
19
Database :
Complementary Index
Journal :
Electronics (2079-9292)
Publication Type :
Academic Journal
Accession number :
172985963
Full Text :
https://doi.org/10.3390/electronics12194118