Back to Search Start Over

An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images.

Authors :
Chen, Yan
Ni, Jianjun
Tang, Guangyi
Cao, Weidong
Yang, Simon X.
Source :
Multimedia Tools & Applications; Jan2024, Vol. 83 Issue 4, p12159-12184, 26p
Publication Year :
2024

Abstract

3D object detection has received extensive attention from researchers. RGB-D sensors are often used for the information complementary in 3D object detection tasks due to their easy acquisition of aligned point cloud and RGB image data, relatively reasonable prices, and reliable performance. However, how to effectively fuse point cloud data and RGB image data in RGB-D images, and use this cross-modal information to improve the performance of 3D object detection, remains a challenge for further research. To deal with these problems, an improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images is proposed in this paper. First, a dense-to-sparse cross-modal learning module (DCLM) is designed, which reduces information waste in the interaction between 2D dense information and 3D sparse information. Then, an inter-modal attention fusion module (IAFM) is designed, which can retain more meaningful information adaptively in the fusion process for the 2D and 3D features. In addition, an intra-modal attention context aggregation module (IACAM) is designed to aggregate context information in both 2D and 3D modalities, and model the relationship between objects. Finally, the detailed quantitative and qualitative experiments are carried out on the SUN RGB-D dataset, and the results show that the proposed model can obtain state-of-the-art 3D object detection results. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13807501
Volume :
83
Issue :
4
Database :
Complementary Index
Journal :
Multimedia Tools & Applications
Publication Type :
Academic Journal
Accession number :
174712547
Full Text :
https://doi.org/10.1007/s11042-023-15845-5