1. Mask-based blind source separation and MVDR beamforming in ASR
- Author
-
Jiaen Liang, Yanhua Long, Yijie Li, and Renke He
- Subjects
Beamforming ,Linguistics and Language ,Computer science ,Speech recognition ,Cocktail party effect ,Blind signal separation ,Language and Linguistics ,Human-Computer Interaction ,Speech enhancement ,Reduction (complexity) ,Background noise ,Minimum-variance unbiased estimator ,Source separation ,Computer Vision and Pattern Recognition ,Software - Abstract
This paper presents a front-end enhancement system for automatic speech recognition to address the cocktail party problem. Cocktail party problem is focus on recognizing the target speech when multiple speakers talk in the noisy real-environments. Many conventional techniques have been proposed. In this work, we propose a new framework to integrate the conventional blind source separation and minimum variance distortionless response beamformer for the speech enhancement and source separation of the recent CHiME-5 challenge. In our experiments, we found that the time–frequency (T–F) mask estimation strategy based on the BSS algorithm should be different for speech enhancement and source separation. The main difference is that whether we need to account for background noise as an additional class during T–F mask estimation. Experimental results showed that the proposed framework was very beneficial to improve the speech recognition performance on the Single-array-track of CHiME-5. We obtained relative 13.5% WER reduction than the official baseline system by only improving the front-end speech enhancement framework.
- Published
- 2019
- Full Text
- View/download PDF