Back to Search Start Over

Long-term speech information based threshold for voice activity detection in massive microphone network.

Authors :
Zhu, Mengyao
Wu, Xiukun
Lu, Zhihua
Wang, Tao
Zhu, Xiaoqiang
Source :
Digital Signal Processing. Nov2019, Vol. 94, p156-164. 9p.
Publication Year :
2019

Abstract

Voice activity detection (VAD) is essential for multiple microphone arrays processing, in which massive potential devices, such as microphone devices for far-field voice-based interaction in smart home environments, will be activated when sound sources appear. Therefore, the VAD can save a lot of computing resources in massive microphone arrays processing for the sparsity in sound source activity. However, it may not be feasible to obtain an accurate VAD in harsh environments, such as far-field, time-varying noise field. In this paper, the long-term speech information (LTSI) and the log-energy are modeled for deriving a more accurate VAD. First, the LTSI can be obtained by measuring the differential entropy of long-term smoothed noisy signal spectrum. Then, the LTSI is used to get labeled data for the initialization of a Gaussian mixture model (GMM), which is used to fit the log-energy distribution of noise and (noisy) speech. Finally, combining the LTSI and the GMM parameters of noise and speech distribution, this paper derives an adaptive threshold, which represents a reasonable boundary between noise and speech. Experimental results show that our VAD method has a remarkable improvement for a massive microphone network. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10512004
Volume :
94
Database :
Academic Search Index
Journal :
Digital Signal Processing
Publication Type :
Periodical
Accession number :
139193258
Full Text :
https://doi.org/10.1016/j.dsp.2019.05.012