Start Over

基于时间分段和重组聚类的说话人日志方法.

Authors :: 朱必松
 毛启容
 高利剑
 沈雅馨
Source :: Application Research of Computers / Jisuanji Yingyong Yanjiu. Sep2024, Vol. 41 Issue 9, p2649-2654. 6p.
Publication Year :: 2024
Abstract: The current methods for speaker diarization commonly employ standard global clustering techniques to distinguish speech segments of different speakers, without considering that voice of the same individual may exhibit various feature distribution under varying background noise conditions, which enlarges the intra-class distances and impacts clustering heavily. Motivated by that adjacent speech segments often share the same background noise, this paper proposed a novel temporalsegment-and-regroup clustering (TSARC) pipeline for speaker diarization to address above issues. Firstly, TSARC partitioned all speech segments into multiple independent intervals along their temporal continuity and conducted local clustering within each interval. Afterwards, it re-associated segments attributed to the same speaker across different intervals. Moreover, during the clustering process, the method actively employed neighborhood information from speech segments to calibrate their similarities. Through this way, TSARC reduced the likelihood of direct clustering under disparate noise conditions, effectively improving the accuracy of clustering. Experimental results on the public datasets AMI SDM and VoxConverse show that, compared with the baseline method, the proposed method has achieved relatively reductions in diarization error rate (DER) by 34% and 16% respectively, which proves the effectiveness of the proposed method. [ABSTRACT FROM AUTHOR]