Back to Search
Start Over
Text-independent speaker recognition using non-linear frame likelihood transformation
- Source :
- Speech Communication. 24:193-209
- Publication Year :
- 1998
- Publisher :
- Elsevier BV, 1998.
-
Abstract
- When the reference speakers are represented by Gaussian mixture model (GMM), the conventional approach is to accumulate the frame likelihoods over the whole test utterance and compare the results as in speaker identification or apply a threshold as in speaker verification. In this paper we describe a method, where frame likelihoods are transformed into new scores according to some non-linear function prior to their accumulation. We have studied two families of such functions. First one, actually, performs likelihood normalization – a technique widely used in speaker verification, but applied here at frame level. The second kind of functions transforms the likelihoods into weights according to some criterion. We call this transformation weighting models rank (WMR). Both kinds of transformations require frame likelihoods from all (or subset of all) reference models to be available. For this, every frame of the test utterance is input to the required reference models in parallel and then the likelihood transformation is applied. The new scores are further accumulated over the whole test utterance in order to obtain an utterance level score for a given speaker model. We have found out that the normalization of these utterance scores also has the effect for speaker verification. The experiments using two databases – TIMIT corpus and NTT database for speaker recognition – showed better speaker identification rates and significant reduction of speaker verification equal error rates (EER) when the frame likelihood transformation was used.
- Subjects :
- Normalization (statistics)
Linguistics and Language
business.industry
Communication
Speech recognition
Pattern recognition
TIMIT
Mixture model
Speaker recognition
Language and Linguistics
Computer Science Applications
Weighting
Speaker diarisation
Modeling and Simulation
Computer Vision and Pattern Recognition
Artificial intelligence
business
Reference model
Software
Utterance
Mathematics
Subjects
Details
- ISSN :
- 01676393
- Volume :
- 24
- Database :
- OpenAIRE
- Journal :
- Speech Communication
- Accession number :
- edsair.doi...........7978f361b8550d6e873759780d8d3b40
- Full Text :
- https://doi.org/10.1016/s0167-6393(98)00010-7