Start Over

Generalizing I-Vector Estimation for Rapid Speaker Recognition

Authors :: Xu, Longting
Lee, Kong Aik
Li, Haizhou
Yang, Zhen
Source :: IEEE-ACM Transactions on Audio, Speech, and Language Processing; 2018, Vol. 26 Issue: 4 p749-759, 11p
Publication Year :: 2018
Abstract: An i-vector is a compact representation that captures both the speaker and session variabilities rendered in a spoken utterance. Over the past years, it has prevailed over other techniques and is now the de facto representation for text-independent speaker recognition. Standard i-vector extraction requires intense computation at run-time. Reducing the computation will allow effective use of i-vector in more applications. Such intense computation arises from the posterior covariance matrix, when estimating the i-vector. There have been studies on how to simplify the computation of posterior covariance matrix with modest success. In this paper, we propose a novel approach to i-vector extraction without the need to evaluate the full posterior covariance thereby speeding up the run-time extraction process. This is achieved by generalizing the i-vector estimation in two ways. First, we introduce the use of occupancy reweighting in conjunction with whitening over the Baum-Welch statistics as part of the preprocessing step. Second, we introduce the so-called subspace-orthogonalizing prior (SOP) to replace the standard Gaussian prior in i-vector formulation. Experiments conducted on the extended-core task of NIST SRE'10 show that the proposed rapid SOP approach achieves considerable speed-up over the standard i-vector with comparable equal error rates.