Back to Search Start Over

Learnable fusion mechanisms for multimodal object detection in autonomous vehicles.

Authors :
Massoud, Yahya
Laganiere, Robert
Source :
IET Computer Vision (Wiley-Blackwell). Jun2024, Vol. 18 Issue 4, p499-511. 13p.
Publication Year :
2024

Abstract

Perception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element‐wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element‐wise addition and feature map concatenation. An in‐depth analysis of key design choices impacting performance, such as data augmentation, multi‐task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
17519632
Volume :
18
Issue :
4
Database :
Academic Search Index
Journal :
IET Computer Vision (Wiley-Blackwell)
Publication Type :
Academic Journal
Accession number :
177627005
Full Text :
https://doi.org/10.1049/cvi2.12259