Back to Search Start Over

基于双分支注意力U-Net的语音增强方法.

Authors :
曹洁
王宸章
梁浩鹏
王乔
李晓旭
Source :
Application Research of Computers / Jisuanji Yingyong Yanjiu. Apr2024, Vol. 41 Issue 4, p1112-1116. 5p.
Publication Year :
2024

Abstract

Aiming at the problem that speech enhancement networks have difficulty in extracting global speech-related features and are ineffective in capturing local contextual information of speech. This paper proposed a two-branch attention and U-Net-based time-domain speech enhancement method, which used a U-Net encoder-decoder structure and took the high-dimensional time-domain features obtained from a single-channel noisy speech after one-dimensional convolution as input. Firstly, this paper designed Conformer-based residual convolution to enhance the noise reduction ability of network by utilizing residual connection. Secondly, this paper designed a two-branch attention mechanism structure, which utilized global and local attention to obtain richer contextual information in the noisy speech, and at the same time, to effectively represent the long sequence features and extract more diverse feature information. Finally, this paper constructed a weighted loss function by combining the loss function in the time domain and frequency domain to train the network and improve the performance in speech enhancement. This paper used several metrics to evaluate the quality and intelligibility of the enhanced speech, the enhanced speech perceptual evaluation of speech quality(PESQ) on the public datasets Voice Bank+DEMAND is 3.11, the short-time objective intelligibility(STOI) is 95%, the composite measure for predicting signal rating(CSIG) is 4.44, the composite measure for predicting background noise(CBAK) is 3.60, and the composite measure for predicting overall processed speech quality(COVL) is 3.81, in which the PESQ is improved by 7.6% compared to SE-Conformer, and improved by 5.1% compared to TSTNN improved by 5.1%. Experimental results show that the proposed method achieves better results in various metrics of speech denoising and meets the requirements for speech enhancement tasks. [ABSTRACT FROM AUTHOR]

Details

Language :
Chinese
ISSN :
10013695
Volume :
41
Issue :
4
Database :
Academic Search Index
Journal :
Application Research of Computers / Jisuanji Yingyong Yanjiu
Publication Type :
Academic Journal
Accession number :
176568903
Full Text :
https://doi.org/10.19734/j.issn.1001-3695.2023.09.0374