Learnable fusion mechanisms for multimodal object detection in autonomous vehicles.

Authors :: Massoud, Yahya
Laganiere, Robert
Source :: IET Computer Vision (Wiley-Blackwell). Jun2024, Vol. 18 Issue 4, p499-511. 13p.
Publication Year :: 2024
Abstract: Perception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element‐wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element‐wise addition and feature map concatenation. An in‐depth analysis of key design choices impacting performance, such as data augmentation, multi‐task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter. [ABSTRACT FROM AUTHOR]

Subjects :: *OBJECT recognition (Computer vision)
*COMPUTER vision
*CONVOLUTIONAL neural networks
*IMAGE recognition (Computer vision)
*DATA augmentation
*AUTONOMOUS vehicles
*BILINEAR forms

Full Text Access

Tools