Back to Search Start Over

Localization based stereo speech source separation using probabilistic time-frequency masking and deep neural networks.

Authors :
Yu, Yang
Wang, Wenwu
Han, Peng
Source :
EURASIP Journal on Audio Speech & Music Processing; 3/4/2016, Vol. 2016 Issue 1, p1-18, 18p
Publication Year :
2016

Abstract

Time-frequency (T-F) masking is an effective method for stereo speech source separation. However, reliable estimation of the T-F mask from sound mixtures is a challenging task, especially when room reverberations are present in the mixtures. In this paper, we propose a new stereo speech separation system where deep neural networks are used to generate soft T-F mask for separation. More specifically, the deep neural network, which is composed of two sparse autoencoders and a softmax regression, is used to estimate the orientations of the dominant source at each T-F unit, based on low-level features, such as mixing vector (MV), interaural level, and phase difference (IPD/ILD). The dataset for training the networks was generated by the convolution of binaural room impulse responses (RIRs) and clean speech signals positioned in different angles with respect to the sensors. With the training dataset, we use unsupervised learning to extract high-level features from low-level features and use supervised learning to find the nonlinear functions between high-level features and the orientations of dominant source. By using the trained networks, the probability that each T-F unit belongs to different sources (target and interferers) can be estimated based on the localization cues which is further used to generate the soft mask for source separation. Experiments based on real binaural RIRs and TIMIT dataset are provided to show the performance of the proposed system for reverberant speech mixtures, as compared with a model-based T-F masking technique proposed recently. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
16874714
Volume :
2016
Issue :
1
Database :
Complementary Index
Journal :
EURASIP Journal on Audio Speech & Music Processing
Publication Type :
Academic Journal
Accession number :
113821239
Full Text :
https://doi.org/10.1186/s13636-016-0085-x