Start Over

Multi-stage Progressive Learning-Based Speech Enhancement Using Time–Frequency Attentive Squeezed Temporal Convolutional Networks.

Authors :: Jannu, Chaitanya
Vanambathina, Sunny Dayal
Source :: Circuits, Systems & Signal Processing. Dec2023, Vol. 42 Issue 12, p7467-7493. 27p.
Publication Year :: 2023
Abstract: Speech enhancement is an important method for improving speech quality and intelligibility in noisy environments. An effective speech enhancement model depends on precise modelling of the long-range dependencies of noisy speech. Several recent studies have examined ways to enhance speech by capturing the long-term contextual information. For speech enhancement, the time–frequency (T–F) distribution of speech spectral components is also important, but is usually ignored in these studies. The multi-stage learning method is an effective way to integrate various deep learning modules at the same time. The benefit of multi-stage training is that the optimization target can be iteratively updated stage by stage. In this paper, speech enhancement is investigated by multi-stage learning using a multi-stage structure in which time–frequency attention (TFA) blocks are followed by stacks of squeezed temporal convolutional networks (S-TCN) with exponentially increasing dilation rates. To reinject original information into later stages, a feature fusion block (FB) is inserted at the input of later stages to reduce the possibility of speech information being lost in the early stages. The S-TCN blocks are responsible for temporal sequence modelling tasks. The time–frequency attention (TFA) is a simple but effective network module that explicitly exploits position information to generate a 2D attention map to characterize the salient T–F distribution of speech by using two branches, time-frame attention and frequency attention in parallel. Extensive experiments have demonstrated that the proposed model consistently improves the performance over existing baselines across two widely used objective metrics such as PESQ and STOI. A significant improvement in system robustness to noise is also shown by our evaluation results using the TFA module. [ABSTRACT FROM AUTHOR]

Subjects :: *CONVOLUTIONAL neural networks
*SPEECH enhancement
*INTELLIGIBILITY of speech
*SPEECH perception
*DEEP learning
*SPEECH

Details

Language :: English
ISSN :: 0278081X
Volume :: 42
Issue :: 12
Database :: Academic Search Index
Journal :: Circuits, Systems & Signal Processing
Publication Type :: Academic Journal
Accession number :: 173034435
Full Text :: https://doi.org/10.1007/s00034-023-02455-7

Full Text Access

View/download PDF

Tools

Email
Cite

Printer

Authors Abstract Subjects Details

Searchworks

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources

Multi-stage Progressive Learning-Based Speech Enhancement Using Time–Frequency Attentive Squeezed Temporal Convolutional Networks.

Abstract

Subjects

Details

Tools

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Multi-stage Progressive Learning-Based Speech Enhancement Using Time–Frequency Attentive Squeezed Temporal Convolutional Networks.

Abstract

Subjects

Details

Tools

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources