Back to Search
Start Over
Multistream speaker diarization of meetings recordings beyond MFCC and TDOA features
- Source :
- Speech Communication. 54:55-67
- Publication Year :
- 2012
- Publisher :
- Elsevier BV, 2012.
-
Abstract
- Many state-of-the-art diarization systems for meeting recordings are based on the HMM/GMM framework and the combination of spectral (MFCC) and time delay of arrivals (TDOA) features. This paper presents an extensive study on how multistream diarization can be improved beyond these two sets of features. While several other features have been proven effective for speaker diarization, little efforts have been devoted to integrate them into the MFCC + TDOA state-of-the-art baseline and to the authors’ best knowledge, no positive results have been reported so far. The first contribution of this paper consists in analyzing the reasons of this, investigating through a set of oracle experiments the robustness of the HMM/GMM diarization when also other features (the modulation spectrum features and the frequency domain linear prediction features) are integrated. The second contribution of the paper consists in introducing a non-parametric multistream diarization method based on the information bottleneck (IB) approach. In contrary to the HMM/GMM which makes use of log-likelihood combination, it combines the feature streams in a normalized space of relevance variables. The previous analysis is repeated revealing that the proposed approach is more robust and can actually benefit from other sources of information beyond the conventional MFCC and TDOA features. Experiments based on the rich transcription data (heterogeneous meetings data recorded in several different rooms) show that it achieves a very competitive error of only 6.3% when four feature streams are used, compared to the 14.9% of the HMM/GMM system. Those results are analyzed in terms of error sensitivity to the stream weightings. To the authors’ best knowledge this is the first successful attempt to reduce the speaker error combining other features with the MFCC and the TDOA and the first study to show the shortcomings of the HMM/GMM in going beyond this baseline. As last contribution, the paper also addresses issues related to the computational complexity of multistream approaches.
- Subjects :
- Linguistics and Language
Computational complexity theory
Computer science
Speech recognition
NIST rich transcription
Language and Linguistics
Oracle
Meeting recordings
Robustness (computer science)
Speaker diarization
Hidden Markov model
Multi-stream modeling
business.industry
Communication
Information bottleneck diarization
Pattern recognition
Information bottleneck method
Multilateration
Computer Science Applications
Speaker diarisation
Modeling and Simulation
Computer Vision and Pattern Recognition
Mel-frequency cepstrum
Artificial intelligence
business
Software
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 54
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi.dedup.....2f3cab2aab943c7bc025a59acb39aa6c
- Full Text :
- https://doi.org/10.1016/j.specom.2011.07.001