Back to Search
Start Over
An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images.
- Source :
- Multimedia Tools & Applications; Jan2024, Vol. 83 Issue 4, p12159-12184, 26p
- Publication Year :
- 2024
-
Abstract
- 3D object detection has received extensive attention from researchers. RGB-D sensors are often used for the information complementary in 3D object detection tasks due to their easy acquisition of aligned point cloud and RGB image data, relatively reasonable prices, and reliable performance. However, how to effectively fuse point cloud data and RGB image data in RGB-D images, and use this cross-modal information to improve the performance of 3D object detection, remains a challenge for further research. To deal with these problems, an improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images is proposed in this paper. First, a dense-to-sparse cross-modal learning module (DCLM) is designed, which reduces information waste in the interaction between 2D dense information and 3D sparse information. Then, an inter-modal attention fusion module (IAFM) is designed, which can retain more meaningful information adaptively in the fusion process for the 2D and 3D features. In addition, an intra-modal attention context aggregation module (IACAM) is designed to aggregate context information in both 2D and 3D modalities, and model the relationship between objects. Finally, the detailed quantitative and qualitative experiments are carried out on the SUN RGB-D dataset, and the results show that the proposed model can obtain state-of-the-art 3D object detection results. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13807501
- Volume :
- 83
- Issue :
- 4
- Database :
- Complementary Index
- Journal :
- Multimedia Tools & Applications
- Publication Type :
- Academic Journal
- Accession number :
- 174712547
- Full Text :
- https://doi.org/10.1007/s11042-023-15845-5