Back to Search Start Over

Dual-branch attention module-based network with parameter sharing for joint sound event detection and localization

Authors :
Yuting Zhou
Hongjie Wan
Source :
EURASIP Journal on Audio, Speech, and Music Processing, Vol 2023, Iss 1, Pp 1-15 (2023)
Publication Year :
2023
Publisher :
SpringerOpen, 2023.

Abstract

Abstract The goal of sound event detection and localization (SELD) is to identify each individual sound event class and its activity time from a piece of audio, while estimating its spatial location at the time of activity. Conformer combines the advantages of convolutional layers and Transformer, which is effective in tasks such as speech recognition. However, it achieves high performance relying on complex network structure and a large number of computations. In the SELD task of this paper, we propose to use an encoder with a simpler network structure, called the dual-branch attention module (DBAM). The module is improved based on the conformer using two parallel branches of attention and convolution, which can model both global and local contextual information. We also blend low-level and high-level features of the localization task. In addition, we add soft parameter sharing to the joint SELD network, which can efficiently exploit the potential relationship between the two subtasks, SED and DOA. The proposed method can effectively detect two sound events with overlapping occurrence in the same time period. We experimented with the open dataset DCASE 2020 task 3 proving that the proposed method achieves better SELD performance than the baseline model. Furthermore, we conducted ablation experiments for verifying the effectiveness of the dual-branch attention module and soft parameter sharing.

Details

Language :
English
ISSN :
16874722
Volume :
2023
Issue :
1
Database :
Directory of Open Access Journals
Journal :
EURASIP Journal on Audio, Speech, and Music Processing
Publication Type :
Academic Journal
Accession number :
edsdoj.61bc4c192794481983ba08b224b34df
Document Type :
article
Full Text :
https://doi.org/10.1186/s13636-023-00292-9