Harmonic Structure Features for Robust Speaker Diarization.

Authors :: Yu Zhou
Hongbin Suo
Junfeng Li
Yonghong Yan
Source :: ETRI Journal; Aug2012, Vol. 34 Issue 4, p583-590, 8p
Publication Year :: 2012
Abstract: In this paper, we present a new approach for speaker diarization. First, we use the prosodic information calculated on the original speech to resynthesize the new speech data utilizing the spectrum modeling technique. The resynthesized data is modeled with sinusoids based on pitch, vibration amplitude, and phase bias. Then, we use the resynthesized speech data to extract cepstral features and integrate them with the cepstral features from original speech for speaker diarization. At last, we show how the two streams of cepstral features can be combined to improve the robustness of speaker diarization. Experiments carried out on the standardized datasets (the US National Institute of Standards and Technology Rich Transcription 04-S multiple distant microphone conditions) show a significant improvement in diarization error rate compared to the system based on only the feature stream from original speech. [ABSTRACT FROM AUTHOR]