Descriptor: "3D object detection" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"3D object detection"' showing total 1,063 results

Start Over Descriptor "3D object detection"

1,063 results on '"3D object detection"'

151. PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles

Author: Husnain Mushtaq, Xiaoheng Deng, Fizza Azhar, Mubashir Ali, and Hafiz Husnain Raza Sherazi
Subjects: LiDAR-camera fusion, object perspective sampling, ViT feature fusion, 3D object detection, autonomous vehicles, Information technology, T58.5-58.64
Abstract: Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision.
Published: 2024
Full Text: View/download PDF

152. DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices

Author: Fei Huang, Shengshu Liu, Guangqian Zhang, Bingsen Hao, Yangkai Xiang, and Kun Yuan
Subjects: multi-sensor information fusion, 3D object detection, BEV, feature fusion, model deployment, Chemical technology, TP1-1185
Abstract: To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird’s-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection.
Published: 2024
Full Text: View/download PDF

153. Three-Dimensional Outdoor Object Detection in Quadrupedal Robots for Surveillance Navigations

Author: Muhammad Hassan Tanveer, Zainab Fatima, Hira Mariam, Tanazzah Rehman, and Razvan Cristian Voicu
Subjects: quadrupedal robots, autonomous vehicle, 3D object detection, cars, KITTI, Materials of engineering and construction. Mechanics of materials, TA401-492, Production of electric energy or power. Powerplants. Central stations, TK1001-1841
Abstract: Quadrupedal robots are confronted with the intricate challenge of navigating dynamic environments fraught with diverse and unpredictable scenarios. Effectively identifying and responding to obstacles is paramount for ensuring safe and reliable navigation. This paper introduces a pioneering method for 3D object detection, termed viewpoint feature histograms, which leverages the established paradigm of 2D detection in projection. By translating 2D bounding boxes into 3D object proposals, this approach not only enables the reuse of existing 2D detectors but also significantly increases the performance with less computation required, allowing for real-time detection. Our method is versatile, targeting both bird’s eye view objects (e.g., cars) and frontal view objects (e.g., pedestrians), accommodating various types of 2D object detectors. We showcase the efficacy of our approach through the integration of YOLO3D, utilizing LiDAR point clouds on the KITTI dataset, to achieve real-time efficiency aligned with the demands of autonomous vehicle navigation. Our model selection process, tailored to the specific needs of quadrupedal robots, emphasizes considerations such as model complexity, inference speed, and customization flexibility, achieving an accuracy of up to 99.93%. This research represents a significant advancement in enabling quadrupedal robots to navigate complex and dynamic environments with heightened precision and safety.
Published: 2024
Full Text: View/download PDF

154. 3D-Scene-Former: 3D scene generation from a single RGB image using Transformers

Author: Chatterjee, Jit and Torres Vega, Maria
Published: 2024
Full Text: View/download PDF

155. SPBA-Net point cloud object detection with sparse attention and box aligning

Author: Sha, Haojie, Gao, Qingrui, Zeng, Hao, Li, Kai, Li, Wang, Zhang, Xuande, and Wang, Xiaohui
Published: 2024
Full Text: View/download PDF

156. A survey on 3D object detection in real time for autonomous driving.

Author: Contreras, Marcelo, Jain, Aayush, Bhatt, Neel P., Banerjee, Arunava, Hashemi, Ehsan, Weiguo Pan, and Alecsandru, Ciprian
Subjects: OBJECT recognition (Computer vision), MONOCULAR vision, WEATHER, AUTONOMOUS vehicles, DETECTORS
Abstract: This survey reviews advances in 3D object detection approaches for autonomous driving. A brief introduction to 2D object detection is first discussed and drawbacks of the existing methodologies are identified for highly dynamic environments. Subsequently, this paper reviews the state-of-the-art 3D object detection techniques that utilizes monocular and stereo vision for reliable detection in urban settings. Based on depth inference basis, learning schemes, and internal representation, this work presents a method taxonomy of three classes: model-based and geometrically constrained approaches, end-to-end learning methodologies, and hybrid methods. There is highlighted segment for current trend of multi-view detectors as end-to-end methods due to their boosted robustness. Detectors from the last two kinds were specially selected to exploit the autonomous driving context in terms of geometry, scene content and instances distribution. To prove the effectiveness of each method, 3D object detection datasets for autonomous vehicles are described with their unique features, e. g., varying weather conditions, multi-modality, multi camera perspective and their respective metrics associated to different difficulty categories. In addition, we included multi-modal visual datasets, i. e., V2X that may tackle the problems of single-view occlusion. Finally, the current research trends in object detection are summarized, followed by a discussion on possible scope for future research in this domain. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

157. 3D Point Cloud Object Detection Method Based on Multi-Scale Dynamic Sparse Voxelization.

Author: Jiayu Wang, Ye Liu, Yongjian Zhu, Dong Wang, and Yu Zhang
Abstract: Perception plays a crucial role in ensuring the safety and reliability of autonomous driving systems. However, the recognition and localization of small objects in complex scenarios still pose challenges. In this paper, we propose a point cloud object detection method based on dynamic sparse voxelization to enhance the detection performance of small objects. This method employs a specialized point cloud encoding network to learn and generate pseudo-images from point cloud features. The feature extraction part uses sliding windows and transformer-based methods. Furthermore, multiscale feature fusion is performed to enhance the granularity of small object information. In this experiment, the term “small object” refers to objects such as cyclists and pedestrians, which have fewer pixels compared to vehicles with more pixels, as well as objects of poorer quality in terms of detection. The experimental results demonstrate that, compared to the PointPillars algorithm and other related algorithms on the KITTI public dataset, the proposed algorithm exhibits improved detection accuracy for cyclist and pedestrian target objects. In particular, there is notable improvement in the detection accuracy of objects in the moderate and hard quality categories, with an overall average increase in accuracy of about 5%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

158. Three-Dimensional Point Cloud Object Detection Based on Feature Fusion and Enhancement.

Author: Li, Yangyang, Ou, Zejun, Liu, Guangyuan, Yang, Zichen, Chen, Yanqiao, Shang, Ronghua, and Jiao, Licheng
Subjects: *OBJECT recognition (Computer vision), *POINT cloud, *OPTICAL radar, *LIDAR, *FEATURE extraction
Abstract: With the continuous emergence and development of 3D sensors in recent years, it has become increasingly convenient to collect point cloud data for 3D object detection tasks, such as the field of autonomous driving. But when using these existing methods, there are two problems that cannot be ignored: (1) The bird's eye view (BEV) is a widely used method in 3D objective detection; however, the BEV usually compresses dimensions by combined height, dimension, and channels, which makes the process of feature extraction in feature fusion more difficult. (2) Light detection and ranging (LiDAR) has a much larger effective scanning depth, which causes the sector to become sparse in deep space and the uneven distribution of point cloud data. This results in few features in the distribution of neighboring points around the key points of interest. The following is the solution proposed in this paper: (1) This paper proposes multi-scale feature fusion composed of feature maps at different levels made of Deep Layer Aggregation (DLA) and a feature fusion module for the BEV. (2) A point completion network is used to improve the prediction results by completing the feature points inside the candidate boxes in the second stage, thereby strengthening their position features. Supervised contrastive learning is applied to enhance the segmentation results, improving the discrimination capability between the foreground and background. Experiments show these new additions can achieve improvements of 2.7%, 2.4%, and 2.5%, respectively, on KITTI easy, moderate, and hard tasks. Further ablation experiments show that each addition has promising improvement over the baseline. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

159. Path aggregation one-stage anchor free 3D object detection.

Author: Liu, Yanfei, Li, Chao, Ning, Kanglin, and Li, Yali
Abstract: In recent years, autonomous driving has entered a rapid development phase and put forward more challenging requirements for perception technology. Different from object detection methods for 2D images, 3D object detection, which uses Light Detection And Ranging (LiDAR) point cloud as input, can accurately provide the coordinates, physical size, and orientation of an object in 3D space. This paper constructs a deep learning neural network for 3D visual object recognition inspired by computational neuroscience. Considering that a part of the visual recognition pathway of the human brain tends to serve multiple visual recognition tasks, we set up an auxiliary task branch when training the proposed 3D object detector. Through this auxiliary branch task, the backbone of our 3D object detector can learn more generalizable features from the point cloud input. As the human brain needs to collect information from different visual areas, the proposed model designed a multi-stride residual 3D backbone network and a path aggregation 2D neck network to achieve similar functions. Extensive experiments have been conducted on the KITTI dataset and Waymo Open Dataset. The results show that our methods could achieve an outstanding balance between speed and accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

160. Pre-Segmented Down-Sampling Accelerates Graph Neural Network-Based 3D Object Detection in Autonomous Driving.

Author: Liang, Zhenming, Huang, Yingping, and Bai, Yanbiao
Subjects: *GRAPH neural networks, *OBJECT recognition (Computer vision), *POINT cloud, *AUTONOMOUS vehicles, *POINT processes, *LIDAR
Abstract: Graph neural networks (GNNs) have been proven to be an ideal approach to deal with irregular point clouds, but involve massive computations for searching neighboring points in the graph, which limits their application in large-scale LiDAR point cloud processing. Down-sampling is a straightforward and indispensable step in current GNN-based 3D detectors to reduce the computational burden of the model, but the commonly used down-sampling methods cannot distinguish the categories of the LiDAR points, which leads to an inability to effectively improve the computational efficiency of the GNN models without affecting their detection accuracy. In this paper, we propose (1) a LiDAR point cloud pre-segmented down-sampling (PSD) method that can selectively reduce background points while preserving the foreground object points during the process, greatly improving the computational efficiency of the model without affecting its 3D detection accuracy. (2) A lightweight GNN-based 3D detector that can extract point features and detect objects from the raw down-sampled LiDAR point cloud directly without any pre-transformation. We test the proposed model on the KITTI 3D Object Detection Benchmark, and the results demonstrate its effectiveness and efficiency for autonomous driving 3D object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

161. Survey and systematization of 3D object detection models and methods.

Author: Drobnitzky, Moritz, Friederich, Jonas, Egger, Bernhard, and Zschech, Patrick
Subjects: *OBJECT recognition (Computer vision), *FEATURE extraction, *RESEARCH personnel
Abstract: Strong demand for autonomous vehicles and the wide availability of 3D sensors are continuously fueling the proposal of novel methods for 3D object detection. In this paper, we provide a comprehensive survey of recent developments from 2012–2021 in 3D object detection covering the full pipeline from input data, over data representation and feature extraction to the actual detection modules. We introduce fundamental concepts, focus on a broad range of different approaches that have emerged over the past decade, and propose a systematization that provides a practical framework for comparing these approaches with the goal of guiding future development, evaluation, and application activities. Specifically, our survey and systematization of 3D object detection models and methods can help researchers and practitioners to get a quick overview of the field by decomposing 3DOD solutions into more manageable pieces. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

162. MVTr: multi-feature voxel transformer for 3D object detection.

Author: Ai, Lingmei, Xie, Zhuoyu, Yao, Ruoxia, and Yang, Mengyao
Subjects: *OBJECT recognition (Computer vision), *TRANSFORMER models, *CONVOLUTIONAL neural networks, *IMAGE segmentation, *POINT cloud
Abstract: Convolutional neural networks have become a powerful tool for partial 3D object detection. However, their power has not been fully realized for focusing on global information, which is crucial for object detection. In this paper, we resolve the problem with a multi-feature voxel transformer (MVTr), an architecture that extracts long-range relationship features through self-attention between multi-feature voxels. In general, converting a point cloud to a voxel representation can reduce a lot of computation, but it would take a long process for the attention network to pay attention to the car voxels in a huge 3D real scene. To this end, we propose a semantic voxel module which takes semantic voxels as input and cooperates with a sparse and a non-empty voxel module to extract features. And the semantic voxels are generated from image segmentation and point cloud projection, which only retains a large number of car voxels. To further enlarge the attention range while maintaining a favorable computational, we propose two attention mechanisms for multi-head attention: local attention and stumpy attention. Finally, we propose the fusion attention module, which can add channel attention and spatial attention to the 2D backbone network. MVTr combines the semantic information of the image and the 3D information of the point cloud and can be applied to most 3D object detection tasks. Experimental results on KITTI dataset show that our method is effective, and the precision has significant advantages compared to other similar feature fusion-based methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

163. S2S-Sim: A Benchmark Dataset for Ship Cooperative 3D Object Detection.

Author: Yang, Wenbin, Wang, Xinzhi, Luo, Xiangfeng, Xie, Shaorong, and Chen, Junxi
Subjects: OBJECT recognition (Computer vision), CONTAINER ships, NAVIGATION in shipping, SHIP models, SHIPS, CRUISE ships, AUTONOMOUS vehicles
Abstract: The rapid development of vehicle cooperative 3D object-detection technology has significantly improved the perception capabilities of autonomous driving systems. However, ship cooperative perception technology has received limited research attention compared to autonomous driving, primarily due to the lack of appropriate ship cooperative perception datasets. To address this gap, this paper proposes S2S-sim, a novel ship cooperative perception dataset. Ship navigation scenarios were constructed using Unity3D, and accurate ship models were incorporated while simulating sensor parameters of real LiDAR sensors to collect data. The dataset comprises three typical ship navigation scenarios, including ports, islands, and open waters, featuring common ship classes such as container ships, bulk carriers, and cruise ships. It consists of 7000 frames with 96,881 annotated ship bounding boxes. Leveraging this dataset, we assess the performance of mainstream vehicle cooperative perception models when transferred to ship cooperative perception scenes. Furthermore, considering the characteristics of ship navigation data, we propose a regional clustering fusion-based ship cooperative 3D object-detection method. Experimental results demonstrate that our approach achieves state-of-the-art performance in 3D ship object detection, indicating its suitability for ship cooperative perception. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

164. Multi-Layer Fusion 3D Object Detection via Lidar Point Cloud and Camera Image.

Author: Guo, Yuhao and Hu, Hui
Subjects: OBJECT recognition (Computer vision), POINT cloud, LASER based sensors, OPTICAL scanners, LIDAR, CAMERAS, INFORMATION networks
Abstract: Object detection is a key task in automatic driving, and the poor performance of small object detection is a challenge that needs to be overcome. Previously, object detection networks could detect large-scale objects in ideal environments, but detecting small objects was very difficult. To address this problem, we propose a multi-layer fusion 3D object detection network. First, a dense fusion (D-fusion) method is proposed, which is different from the traditional fusion method. By fusing the feature maps of each layer, more semantic information of the fusion network can be preserved. Secondly, in order to preserve small objects at the feature map level, we designed a feature extractor with an adaptive fusion module (AFM), which reduces the impact of the background on small objects by weighting and fusing different feature layers. Finally, an attention mechanism was added to the feature extractor to accelerate the training efficiency and convergence speed of the network by suppressing information that is irrelevant to the task. The experimental results show that our proposed approach greatly improves the baseline and outperforms most state-of-the-art methods on KITTI object detection benchmarks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

165. 3D object detection based on point cloud in automatic driving scene.

Author: Li, Hai-Sheng and Lu, Yan-Ling
Abstract: In many real-time applications such as autonomous driving and robotics, 3D object detection algorithms represented by PointPillars have great potential to design fast and reliable 3D object detection algorithms by using point cloud columns (Pillars) to represent point clouds. However, this kind of algorithm still has some shortcomings, such as poor detection results for some small objects or distant objects and the existence of wrong detection, missing detection and other problems. In order to solve these problems, we design a three-branch extended convolutional network in the 3D object detection algorithm, which can alleviate the insensitivity of the original network to targets of different sizes, especially small targets. Then, we design an improved hybrid attention mechanism network in 3D object detection algorithm to solve the problem of missing detection and error detection in long-distance vehicle detection. From the experimental verification of KITTI dataset, we draw the following conclusion: Our network has great advantages compared with PointPillars, especially the big improvement in the mAP(mean Average Precision) of vehicle detection and pedestrian and rider detection, in the case that the detection speed is basically equal to PointPillars. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

166. GA-RCNN:Graph self-attention feature extraction for 3D object detection.

Author: Yi, Yangyang, Yu, Long, Tian, Shengwei, Gao, Xuezhuang, Li, Jie, and Zhao, Xingang
Subjects: *OBJECT recognition (Computer vision), *POINT cloud, *FEATURE extraction, *AUTONOMOUS vehicles, *DEEP learning
Abstract: In recent years, 3D object detection based on LiDAR point clouds is a key component of autonomous driving. In pursuit of enhancing the accuracy of 3D point cloud feature extraction and point cloud detection, this paper introduces a novel 3D object detection model, termed as Graph Self-Attention-RCNN (GA-RCNN). This model is designed to integrate voxel information and point location information, enhancing the quality of 3D object proposals while maintaining contextual accuracy. The first stage rectifies the previous approach that relied on local features for preselected boxes, overlooking crucial global contextual information. An improved method is suggested in this work, utilizing BEV to capture long-range dependencies via a cross-attention mechanism. The second stage addresses the overreliance on local neighborhood point feature extraction. The Graph Self-Attention Pooling method is proposed, characterized by its dynamic computation of contribution weights for inputs. This enhances the model's flexibility and generalization performance. Extensive evaluations on KITTI and Waymo datasets demonstrate GA-RCNN's superior accuracy compared to other methods, affirming its efficacy in 3D object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

167. Kalman-Based Scene Flow Estimation for Point Cloud Densification and 3D Object Detection in Dynamic Scenes.

Author: Ding, Junzhe, Zhang, Jin, Ye, Luqin, and Wu, Cheng
Subjects: *OBJECT recognition (Computer vision), *FIX-point estimation, *POINT cloud, *KALMAN filtering
Abstract: Point cloud densification is essential for understanding the 3D environment. It provides crucial structural and semantic information for downstream tasks such as 3D object detection and tracking. However, existing registration-based methods struggle with dynamic targets due to the incompleteness and deformation of point clouds. To address this challenge, we propose a Kalman-based scene flow estimation method for point cloud densification and 3D object detection in dynamic scenes. Our method effectively tackles the issue of localization errors in scene flow estimation and enhances the accuracy and precision of shape completion. Specifically, we introduce a Kalman filter to correct the dynamic target's position while estimating long sequence scene flow. This approach helps eliminate the cumulative localization error during the scene flow estimation process. Extended experiments on the KITTI 3D tracking dataset demonstrate that our method significantly improves the performance of LiDAR-only detectors, achieving superior results compared to the baselines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

168. Sparse Embedded Convolution Based Dual Feature Aggregation 3D Object Detection Network.

Author: Li, Hai-Sheng and Lu, Yan-Ling
Abstract: The algorithm design of compatible detection speed and accuracy based on LiDAR point clouds is a challenging issue in various practical applications of 3D object detection, including the field of autonomous driving. This paper designs a single-stage object detection algorithm that is lightweight and compatible with detection speed and accuracy for the above issue. To achieve these objectives, we propose a framework for a 3D object detection algorithm using a single-stage detection network as the backbone network. Firstly, we design a dual feature extraction module to reduce the occurrence of vehicle miss and error detection problems. Then, we use a multi-scale feature fusion scheme to fuse feature information with different scales. Furthermore, we design a data enhancement scheme suitable for this network architecture. Experimental results in the KITTI dataset show that the proposed method achieves improvement ratios of 38.5% for the detection speed and 2.88% ∼ 13.65% in terms of the average precision of vehicle detection compared to the existing algorithm based on single-stage object detection (SECOND). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

169. Improving 3D Object Detection with Context-Aware and Dimensional Interaction Attention.

Author: Zhou, Jing, Gong, Zixin, and Zhang, Junchi
Abstract: Recently, 3D object detection technology based on point clouds has developed rapidly. However, too few points of distant and occluded objects are scanned by the sensor, and thus these objects suffer from too insufficient features to be detected. This case damages the detection accuracy. Therefore, we constitute a novel 3D object detection with Context-aware and dimensional Interaction Attention Network (CIANet) to explore vital geometric cues for enriching the feature representation of the object, thus boosting the overall detection performance. Specifically, in the first stage, we employ the 3D sparse convolution to extract voxel features, and then construct a Channel-Spatial Hybrid Attention (CSHA) module and a Contextual Self-Attention (CSA) module to enhance voxel features for generating proposals. The CSHA module aims to enhance the key information of the channel and spatial domains of 2D Bird’s Eye View (BEV) features, and the CSA module is applied to supplement contextual information to the enhanced BEV features, thus generating accurate proposals. In the second stage, we construct a Dimensional Interaction Attention (DIA) module to refine Region of Interest (RoI) features within the proposals. It enhances the interactions among the channel and spatial dimensions of RoI features to learn accurate boundaries of objects for proposal refinement. Extensive experiments on the KITTI and Waymo benchmarks show the superior detection performance of CIANet compared to recent methods, especially for objects such as pedestrians and cyclists. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

170. 基于知识蒸馏和定位引导的 Pointpillars 点云检测网络.

Author: 赵晶, 李少博, 郭杰龙, 俞辉, 张剑锋, and 李杰
Subjects: OBJECT recognition (Computer vision), POINT cloud, LASERS, CONFIDENCE, CLASSIFICATION
Abstract: Copyright of Chinese Journal of Liquid Crystal & Displays is the property of Chinese Journal of Liquid Crystal & Displays and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

171. 基于激光雷达的3D目标检测研究综述.

Author: 余杭
Abstract: Copyright of Automotive Digest is the property of Automotive Digest Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

172. DMFF: dual-way multimodal feature fusion for 3D object detection.

Author: Dong, Xiaopeng, Di, Xiaoguang, and Wang, Wenzhuang
Abstract: Recently, multimodal 3D object detection that fuses the complementary information from LiDAR data and RGB images has been an active research topic. However, it is not trivial to fuse images and point clouds because of different representations of them. Inadequate feature fusion also brings bad effects on detection performance. We convert images into pseudo point clouds by using a depth completion and utilize a more efficient feature fusion method to address the problems. In this paper, we propose a dual-way multimodal feature fusion network (DMFF) for 3D object detection. Specifically, we first use a dual stream feature extraction module (DSFE) to generate homogeneous LiDAR and pseudo region of interest (RoI) features. Then, we propose a dual-way feature interaction method (DWFI) that enables intermodal and intramodal interaction of the two features. Next, we design a local attention feature fusion module (LAFF) to select which features of the input are more likely to contribute to the desired output. In addition, the proposed DMFF achieves the state-of-the-art performances on the KITTI Dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

173. An improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images.

Author: Chen, Yan, Ni, Jianjun, Tang, Guangyi, Cao, Weidong, and Yang, Simon X.
Subjects: OBJECT recognition (Computer vision), OPTICAL scanners, POINT cloud, LEARNING modules, RESEARCH personnel, PRICES
Abstract: 3D object detection has received extensive attention from researchers. RGB-D sensors are often used for the information complementary in 3D object detection tasks due to their easy acquisition of aligned point cloud and RGB image data, relatively reasonable prices, and reliable performance. However, how to effectively fuse point cloud data and RGB image data in RGB-D images, and use this cross-modal information to improve the performance of 3D object detection, remains a challenge for further research. To deal with these problems, an improved dense-to-sparse cross-modal fusion network for 3D object detection in RGB-D images is proposed in this paper. First, a dense-to-sparse cross-modal learning module (DCLM) is designed, which reduces information waste in the interaction between 2D dense information and 3D sparse information. Then, an inter-modal attention fusion module (IAFM) is designed, which can retain more meaningful information adaptively in the fusion process for the 2D and 3D features. In addition, an intra-modal attention context aggregation module (IACAM) is designed to aggregate context information in both 2D and 3D modalities, and model the relationship between objects. Finally, the detailed quantitative and qualitative experiments are carried out on the SUN RGB-D dataset, and the results show that the proposed model can obtain state-of-the-art 3D object detection results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

174. F-3DNet: Extracting inner order of point cloud for 3D object detection in autonomous driving.

Author: Xu, Fenglei, Zhao, Haokai, Wu, Yifei, and Tao, Chongben
Abstract: 3D object detection has aroused widespread concerns, in which point cloud research is the most popular one.Point clouds are always deemed as irregular and disordered, however implicit order actually exists due to laser arrangement and sequential scanning. Therefore, the authors improve 3D detection accuracy by exploring point cloud inner order, which contains context information but neglected before. In this paper, the authors propose a novel method termed Frustum 3DNet for 3D object detection from point clouds. Following inner order, rearranged feature matrix is constructed, and a pseudo panorama is generated from LiDAR data. Given 2D region proposals on the pseudo image, the authors extend them to 3D space and obtain frustum regions of interest. For each frustum, generate a sequence of small frustums by slicing over distance. To further cooperate with context information, a novel local context feature extraction module is introduced. The extracted context features are concatenated with frustum features afterwards. The feature map is fed to a fully convolutional network , followed by a classifier and a regressor. Refinement and Fusion with RGB input are attached for outcome improvement. Ablation studies verify the efficacy of context extraction component and the corresponding model architecture in this paper. The authors present experiments on KITTI and Nuscenes datasets and F-3DNet outperforms existing methods at the time of submission. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

175. Equal Emphasis on Data and Network: A Two-Stage 3D Point Cloud Object Detection Algorithm with Feature Alignment.

Author: Xiao, Kai, Li, Teng, Li, Jun, Huang, Da, and Peng, Yuanxi
Subjects: *OBJECT recognition (Computer vision), *POINT cloud, *COMPUTER vision, *ALGORITHMS, *AERONAUTICAL navigation, *DEEP learning, *MULTISPECTRAL imaging
Abstract: Three-dimensional object detection is a pivotal research topic in computer vision, aiming to identify and locate objects in three-dimensional space. It has wide applications in various fields such as geoscience, autonomous driving, and drone navigation. The rapid development of deep learning techniques has led to significant advancements in 3D object detection. However, with the increasing complexity of applications, 3D object detection faces a series of challenges such as data imbalance and the effectiveness of network models. Specifically, in an experiment, our investigation revealed a notable discrepancy in the LiDAR reflection intensity within a point cloud scene, with stronger intensities observed in proximity and weaker intensities observed at a distance. Furthermore, we have also noted a substantial disparity in the number of foreground points compared to the number of background points. Especially in 3D object detection, the foreground point is more important than the background point, but it is usually downsampled without discrimination in the subsequent processing. With the objective of tackling these challenges, we work from both data and network perspectives, designing a feature alignment filtering algorithm and a two-stage 3D object detection network. Firstly, in order to achieve feature alignment, we introduce a correction equation to decouple the relationship between distance and intensity and eliminate the attenuation effect of intensity caused by distance. Then, a background point filtering algorithm is designed by using the aligned data to alleviate the problem of data imbalance. At the same time, we take into consideration the fact that the accuracy of semantic segmentation plays a crucial role in 3D object detection. Therefore, we propose a two-stage deep learning network that integrates spatial and spectral information, in which a feature fusion branch is designed and embedded in the semantic segmentation backbone. Through a series of experiments on the KITTI dataset, it is proven that the proposed method achieves the following average precision (AP_R40) values for easy, moderate, and hard difficulties, respectively: car (Iou 0.7)—89.23%, 80.14%, and 77.89%; pedestrian (Iou 0.5)—52.32%, 45.47%, and 38.78%; and cyclist (Iou 0.5)—76.41%, 61.92%, and 56.39%. By emphasizing both data quality optimization and efficient network architecture, the performance of the proposed method is made comparable to other state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

176. Instance Segmentation Frustum–PointPillars: A Lightweight Fusion Algorithm for Camera–LiDAR Perception in Autonomous Driving.

Author: Wang, Yongsheng, Han, Xiaobo, Wei, Xiaoxu, and Luo, Jie
Subjects: *OBJECT recognition (Computer vision), *AUTONOMOUS vehicles, *POINT cloud, *ALGORITHMS, *OPTICAL scanners, *LIDAR, *DRIVERLESS cars
Abstract: The fusion of camera and LiDAR perception has become a research focal point in the autonomous driving field. Existing image–point cloud fusion algorithms are overly complex, and processing large amounts of 3D LiDAR point cloud data requires high computational power, which poses challenges for practical applications. To overcome the above problems, herein, we propose an Instance Segmentation Frustum (ISF)–PointPillars method. Within the framework of our method, input data are derived from both a camera and LiDAR. RGB images are processed using an enhanced 2D object detection network based on YOLOv8, thereby yielding rectangular bounding boxes and edge contours of the objects present within the scenes. Subsequently, the rectangular boxes are extended into 3D space as frustums, and the 3D points located outside them are removed. Afterward, the 2D edge contours are also extended to frustums to filter the remaining points from the preceding stage. Finally, the retained points are sent to our improved 3D object detection network based on PointPillars, and this network infers crucial information, such as object category, scale, and spatial position. In pursuit of a lightweight model, we incorporate attention modules into the 2D detector, thereby refining the focus on essential features, minimizing redundant computations, and enhancing model accuracy and efficiency. Moreover, the point filtering algorithm substantially diminishes the volume of point cloud data while concurrently reducing their dimensionality, thereby ultimately achieving lightweight 3D data. Through comparative experiments on the KITTI dataset, our method outperforms traditional approaches, achieving an average precision (AP) of 88.94% and bird's-eye view (BEV) accuracy of 90.89% in car detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

177. Emerging Trends in Autonomous Vehicle Perception: Multimodal Fusion for 3D Object Detection.

Author: Alaba, Simegnew Yihunie, Gurbuz, Ali C., and Ball, John E.
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, CONVOLUTIONAL neural networks, AUTONOMOUS vehicles, DEEP learning, TRACKING radar, DRIVERLESS cars
Abstract: The pursuit of autonomous driving relies on developing perception systems capable of making accurate, robust, and rapid decisions to interpret the driving environment effectively. Object detection is crucial for understanding the environment at these systems' core. While 2D object detection and classification have advanced significantly with the advent of deep learning (DL) in computer vision (CV) applications, they fall short in providing essential depth information, a key element in comprehending driving environments. Consequently, 3D object detection becomes a cornerstone for autonomous driving and robotics, offering precise estimations of object locations and enhancing environmental comprehension. The CV community's growing interest in 3D object detection is fueled by the evolution of DL models, including Convolutional Neural Networks (CNNs) and Transformer networks. Despite these advancements, challenges such as varying object scales, limited 3D sensor data, and occlusions persist in 3D object detection. To address these challenges, researchers are exploring multimodal techniques that combine information from multiple sensors, such as cameras, radar, and LiDAR, to enhance the performance of perception systems. This survey provides an exhaustive review of multimodal fusion-based 3D object detection methods, focusing on CNN and Transformer-based models. It underscores the necessity of equipping fully autonomous vehicles with diverse sensors to ensure robust and reliable operation. The survey explores the advantages and drawbacks of cameras, LiDAR, and radar sensors. Additionally, it summarizes autonomy datasets and examines the latest advancements in multimodal fusion-based methods. The survey concludes by highlighting the ongoing challenges, open issues, and potential directions for future research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

178. Point cloud 3D object detection method based on density information-local feature fusion.

Author: Chen, Yanjie, Xu, Feng, Chen, Guodong, Liang, Zhiqiang, and Li, Jin
Abstract: Nowadays, three-dimensional (3D) point cloud is widely used in unmanned driving, high-precision mapping, robot grasping, mapping and virtual reality (VR) / augmented reality (AR), etc. Especially, many studies have focused on object detection through directly processing point cloud, but they don't take into account the uneven density of point clouds on the surface of object and lack of feature information. Inspired by this, we propose a new 3D object detection method based on density information-local feature fusion for point cloud. Firstly, the 3D coordinate features, high-dimensional features and density features of the point cloud are extracted through the backbone feature extraction network, and the local features of the point cloud are extracted through the sampling and grouping operation. Secondly, attention mechanism is used to encode the information between local features with density information. Then, the voting network is used to make the point clouds return to the center of the object. Finally, the point clouds are clustered and proposed to generate 3D bounding boxes. The proposed method can reduce the influence brought by the uneven sampling of point cloud and enhance the feature information of object, thereby improving the accuracy of 3D object detection. Specifically, the proposed method is validated on the SUNRGB-D and ScanNet datasets. Through various experiments, we confirm the proposed method's effectiveness and robustness to improve the performance of 3D object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

179. Exploring 3D Object Detection for Autonomous Factory Driving: Advanced Research on Handling Limited Annotations with Ground Truth Sampling Augmentation †.

Author: Reuse, Matthias, Amende, Karl, Simon, Martin, and Sick, Bernhard
Subjects: OBJECT recognition (Computer vision), DATA augmentation, DRIVERLESS cars, THIRD-party logistics, INDUSTRIAL productivity
Abstract: Autonomously driving vehicles in car factories and parking spaces can represent a competitive advantage in the logistics industry. However, the real-world application is challenging in many ways. First of all, there are no publicly available datasets for this specific task. Therefore, we equipped two industrial production sites with up to 11 LiDAR sensors to collect and annotate our own data for infrastructural 3D object detection. These form the basis for extensive experiments. Due to the still limited amount of labeled data, the commonly used ground truth sampling augmentation is the core of research in this work. Several variations of this augmentation method are explored, revealing that in our case, the most commonly used is not necessarily the best. We show that an easy-to-create polygon can noticeably improve the detection results in this application scenario. By using these augmentation methods, it is even possible to achieve moderate detection results when only empty frames without any objects and a database with only a few labeled objects are used. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

180. Automotive LiDAR 3D object detection algorithm based on multibranch feature fusion.

Author: JIN Weizheng, SUN Yuan, and LI Fangyu
Subjects: OBJECT recognition (Computer vision), CONVOLUTIONAL neural networks, ELECTRIC vehicles, AUTONOMOUS vehicles, LIDAR, ALGORITHMS
Abstract: [Objective] With the rapid popularization of new energy vehicles and the vigorous development of autonomous driving technology, 3D object detection algorithms play a pivotal role in real road scenes. The LiDAR point clouds contain precise position and geometric structure information of the object, which can accurately describe the 3D space position of the target. Moreover, LiDAR makes environmental perception and route planning of unmanned vehicles a reality. However, cars in real scenes often fall into complex and difficult situations, such as occlusion and truncation of objects, which contribute to highly sparse clouds and incomplete contours. Therefore, the effective use of disordered and unevenly distributed point clouds for accurate 3D object detection has important research significance and practical value for the safety of autonomous driving. [Methods] This paper uses LiDAR point clouds in an autonomous driving scene to conduct in-depth research on a high-performance 3D object detection algorithm based on deep learning. The 3D object detection algorithm based on multibranch feature fusion (PSANet) is designed to improve the capacity and viability of autonomous driving technology. After the disordered point clouds are divided into regular voxels, the voxel feature coding module and convolutional neural network are used to learn the voxel features, and the sparse 3D data are compressed into a dense 2D bird's-eye view. Furthermore, the multiscale bird's-eye view features are deeply fused through the coarse and fine branches of the 2D backbone network. The splitting and aggregation feature pyramid module in the fine branch splits and aggregates the bird's-eye view features at different levels and realizes the deep fusion of semantic information, texture information, and context information of multiscale features to obtain more expressive features. The multiscale features in the coarse branch are fused after transposed convolution, and the precise original spatial location information is preserved. After the feature extraction of coarse and fine branches, the element-wise addition method can obtain more accurate and complete features for object classification, position regression, and orientation prediction. [Results] The experimental results on the KITTI dataset show that the average precision of PSANet in 3D object detection and bird's-eye view object detection tasks reach 81.72% and 88.25%, respectively. The inference speed on a single GTX 1080Ti GPU can reach 24 frames per second, and it shows strong robustness in complex scenes. Compared with the two-stage target detection algorithms MV3D, AVOD-FPN, F-PointNet, and IPOD, the average accuracy of 3D target detection of this algorithm increased by 18.21%, 5.89%, 8.94%, and 3.12%, respectively. Compared with the one-stage target detection algorithms VoxelNet, SECOND, PointPillars, and VoTr-SSD, the average accuracy of 3D target detection of this algorithm increased by 11.63%, 4.05%, 3.19%, 0.98%, 1.16%, and 0.7%, respectively. The detection speed of this algorithm improved by 14 frames per second compared with the two-stage algorithm PointRCNN with similar accuracy. [Conclusions] In comparison with other advanced algorithms, this algorithm exhibits strong performance, in which it can better balance the accuracy and speed of target detection in autonomous driving scenarios. Higher accuracy is required in the algorithm, whether one- or two-stage, for novel applications. The method with the highest efficiency must be used in an autonomous car. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

181. DeLiVoTr: Deep and light-weight voxel transformer for 3D object detection

Author: Gopi Krishna Erabati and Helder Araujo
Subjects: 3D object detection, Transformer, Voxel, LiDAR, Autonomous driving, Computer vision, Cybernetics, Q300-390, Electronic computers. Computer science, QA75.5-76.95
Abstract: The image-based backbone (feature extraction) networks downsample the feature maps not only to increase the receptive field but also to efficiently detect objects of various scales. The existing feature extraction networks in LiDAR-based 3D object detection tasks follow the feature map downsampling similar to image-based feature extraction networks to increase the receptive field. But, such downsampling of LiDAR feature maps in large-scale autonomous driving scenarios hinder the detection of small size objects, such as pedestrians. To solve this issue we design an architecture that not only maintains the same scale of the feature maps but also the receptive field in the feature extraction network to aid for efficient detection of small size objects. We resort to attention mechanism to build sufficient receptive field and we propose a Deep and Light-weight Voxel Transformer (DeLiVoTr) network with voxel intra- and inter-region transformer modules to extract voxel local and global features respectively. We introduce DeLiVoTr block that uses transformations with expand and reduce strategy to vary the width and depth of the network efficiently. This facilitates to learn wider and deeper voxel representations and enables to use not only smaller dimension for attention mechanism but also a light-weight feed-forward network, facilitating the reduction of parameters and operations. In addition to model scaling, we employ layer-level scaling of DeLiVoTr encoder layers for efficient parameter allocation in each encoder layer instead of fixed number of parameters as in existing approaches. Leveraging layer-level depth and width scaling we formulate three variants of DeLiVoTr network. We conduct extensive experiments and analysis on large-scale Waymo and KITTI datasets. Our network surpasses state-of-the-art methods for detection of small objects (pedestrians) with an inference speed of 20.5 FPS.
Published: 2024
Full Text: View/download PDF

182. Not all points are balanced: Class balanced single-stage outdoor multi-class 3D object detector from point clouds

Author: Yidong Chen, Guorong Cai, Qiming Xia, Zhaoliang Liu, Binghui Zeng, Zongliang Zhang, Jinhe Su, and Zongyue Wang
Subjects: 3D object detection, Point clouds, Balanced strategy, Multi-class, Physical geography, GB3-5030, Environmental sciences, GE1-350
Abstract: Outdoor 3D object detection is a hot topic in autonomous driving. The mainstream pure point cloud method is down-sampling through different task-oriented strategies to retain representative foreground points. Although such strategies are conducive to finding instances, these methods still suffer from two issues: class points imbalance during down-sampling stages, and foreground/background points imbalance in the final retained point clouds. The former imbalance results in poor precision for small objects; and the latter ignores background points, leading to a false positive phenomenon. To tackle the unbalanced phenomenon, we propose a simple yet effective balanced 3D detector, termed CB-SSD, including two balanced strategies: class balance strategy (CBS) and foreground/background balance strategy (FBBS). It is important to note that we do not alter the distribution of point clouds. Instead, we guide the model’s attention towards different classes equally. CB-SSD shows better precision on small objects, reducing false positives where foreground points and background points are similar. Considering both speed and accuracy, CB-SSD achieves state-of-the-art based on pure point clouds (single-stage) on KITTI and ONCE datasets. On KITTI, CB-SSD attains a multi-class accuracy of 72.92 mAP with 81 FPS.
Published: 2024
Full Text: View/download PDF

183. Adaptive learning point cloud and image diversity feature fusion network for 3D object detection

Author: Weiqing Yan, Shile Liu, Hao Liu, Guanghui Yue, Xuan Wang, Yongchao Song, and Jindong Xu
Subjects: 3D object detection, LiDAR point cloud, Fine-grained image, Diversity feature fusion, Electronic computers. Computer science, QA75.5-76.95, Information technology, T58.5-58.64
Abstract: Abstract 3D object detection is a critical task in the fields of virtual reality and autonomous driving. Given that each sensor has its own strengths and limitations, multi-sensor-based 3D object detection has gained popularity. However, most existing methods extract high-level image semantic features and fuse them with point cloud features, focusing solely on consistent information from both sensors while ignoring their complementary information. In this paper, we present a novel two-stage multi-sensor deep neural network, called the adaptive learning point cloud and image diversity feature fusion network (APIDFF-Net), for 3D object detection. Our approach employs the fine-grained image information to complement the point cloud information by combining low-level image features with high-level point cloud features. Specifically, we design a shallow image feature extraction module to learn fine-grained information from images, instead of relying on deep layer features with coarse-grained information. Furthermore, we design a diversity feature fusion (DFF) module that transforms low-level image features into point-wise image features and explores their complementary features through an attention mechanism, ensuring an effective combination of fine-grained image features and point cloud features. Experiments on the KITTI benchmark show that the proposed method outperforms state-of-the-art methods.
Published: 2023
Full Text: View/download PDF

184. Grid self-attention mechanism 3D object detection method based on raw point cloud

Author: Bin LU, Yang SUN, and Zhenyu YANG
Subjects: 3D object detection, point cloud, self-attention mechanism, spatial coordinate encoding, soft regression loss, Telecommunication, TK5101-6720
Abstract: To enhance the feature representation of region of interest (RoI), which incorporated a spatial context encoding module and soft regression loss, a grid self-attention mechanism 3D object detection method based on raw point cloud, named GT3D, was proposed.The spatial context encoding module was designed to effectively weight the local and spatial features of points through the attention mechanism, considering the contribution of different point cloud features for a more accurate feature representation.The soft regression loss was introduced to address label ambiguity arising during the data annotation phase.Experiments conducted on the public KITTI 3D object detection dataset demonstrate that the proposed method achieves significant improvements in detection accuracy compared to other publicly available point cloud-based 3D object detection methods.The detection results of the test set are submitted to the official KITTI server for public evaluation, achieving detection accuracies of 91.45%, 82.76%, and 79.74% for easy, moderate, and hard difficulty levels in car detection, respectively.
Published: 2023
Full Text: View/download PDF

185. Real-time 3D multi-pedestrian detection and tracking using 3D LiDAR point cloud for mobile robot

Author: Ki-In Na and Byungjae Park
Subjects: 3d instance segmentation, 3d object detection, motion estimation, multiple-object tracking, robot, Telecommunication, TK5101-6720, Electronics, TK7800-8360
Abstract: Mobile robots are used in modern life; however, object recognition is still insufficient to realize robot navigation in crowded environments. Mobile robots must rapidly and accurately recognize the movements and shapes of pedestrians to navigate safely in pedestrian-rich spaces. This study proposes real-time, accurate, three-dimensional (3D) multi-pedestrian detection and tracking using a 3D light detection and ranging (LiDAR) point cloud in crowded environments. The pedestrian detection quickly segments a sparse 3D point cloud into individual pedestrians using a lightweight convolutional autoencoder and connected-component algorithm. The multi-pedestrian tracking identifies the same pedestrians considering motion and appearance cues in continuing frames. In addition, it estimates pedestrians' dynamic movements with various patterns by adaptively mixing heterogeneous motion models. We evaluate the computational speed and accuracy of each module using the KITTI dataset. We demonstrate that our integrated system, which rapidly and accurately recognizes pedestrian movement and appearance using a sparse 3D LiDAR, is applicable for robot navigation in crowded spaces.
Published: 2023
Full Text: View/download PDF

186. AEPF: Attention-Enabled Point Fusion for 3D Object Detection

Author: Sachin Sharma, Richard T. Meyer, and Zachary D. Asher
Subjects: 3D object detection, sensor fusion, autonomous vehicles, LiDAR, camera, Chemical technology, TP1-1185
Abstract: Current state-of-the-art (SOTA) LiDAR-only detectors perform well for 3D object detection tasks, but point cloud data are typically sparse and lacks semantic information. Detailed semantic information obtained from camera images can be added with existing LiDAR-based detectors to create a robust 3D detection pipeline. With two different data types, a major challenge in developing multi-modal sensor fusion networks is to achieve effective data fusion while managing computational resources. With separate 2D and 3D feature extraction backbones, feature fusion can become more challenging as these modes generate different gradients, leading to gradient conflicts and suboptimal convergence during network optimization. To this end, we propose a 3D object detection method, Attention-Enabled Point Fusion (AEPF). AEPF uses images and voxelized point cloud data as inputs and estimates the 3D bounding boxes of object locations as outputs. An attention mechanism is introduced to an existing feature fusion strategy to improve 3D detection accuracy and two variants are proposed. These two variants, AEPF-Small and AEPF-Large, address different needs. AEPF-Small, with a lightweight attention module and fewer parameters, offers fast inference. AEPF-Large, with a more complex attention module and increased parameters, provides higher accuracy than baseline models. Experimental results on the KITTI validation set show that AEPF-Small maintains SOTA 3D detection accuracy while inferencing at higher speeds. AEPF-Large achieves mean average precision scores of 91.13, 79.06, and 76.15 for the car class’s easy, medium, and hard targets, respectively, in the KITTI validation set. Results from ablation experiments are also presented to support the choice of model architecture.
Published: 2024
Full Text: View/download PDF

187. A Systematic Survey of Transformer-Based 3D Object Detection for Autonomous Driving: Methods, Challenges and Trends

Author: Minling Zhu, Yadong Gong, Chunwei Tian, and Zuyuan Zhu
Subjects: 3D object detection, autonomous driving, transformer, survey, Motor vehicles. Aeronautics. Astronautics, TL1-4050
Abstract: In recent years, with the continuous development of autonomous driving technology, 3D object detection has naturally become a key focus in the research of perception systems for autonomous driving. As the most crucial component of these systems, 3D object detection has gained significant attention. Researchers increasingly favor the deep learning framework Transformer due to its powerful long-term modeling ability and excellent feature fusion advantages. A large number of excellent Transformer-based 3D object detection methods have emerged. This article divides the methods based on data sources. Firstly, we analyze different input data sources and list standard datasets and evaluation metrics. Secondly, we introduce methods based on different input data and summarize the performance of some methods on different datasets. Finally, we summarize the limitations of current research, discuss future directions and provide some innovative perspectives.
Published: 2024
Full Text: View/download PDF

188. Diffusion Models-Based Purification for Common Corruptions on Robust 3D Object Detection

Author: Mumuxin Cai, Xupeng Wang, Ferdous Sohel, and Hang Lei
Subjects: 3D object detection, LiDAR scene data, point cloud, diffusion models, defence strategy, adversarial robustness, Chemical technology, TP1-1185
Abstract: LiDAR sensors have been shown to generate data with various common corruptions, which seriously affect their applications in 3D vision tasks, particularly object detection. At the same time, it has been demonstrated that traditional defense strategies, including adversarial training, are prone to suffering from gradient confusion during training. Moreover, they can only improve their robustness against specific types of data corruption. In this work, we propose LiDARPure, which leverages the powerful generation ability of diffusion models to purify corruption in the LiDAR scene data. By dividing the entire scene into voxels to facilitate the processes of diffusion and reverse diffusion, LiDARPure overcomes challenges induced from adversarial training, such as sparse point clouds in large-scale LiDAR data and gradient confusion. In addition, we utilize the latent geometric features of a scene as a condition to assist the generation of diffusion models. Detailed experiments show that LiDARPure can effectively purify 19 common types of LiDAR data corruption. Further evaluation results demonstrate that it can improve the average precision of 3D object detectors to an extent of 20% in the face of data corruption, much higher than existing defence strategies.
Published: 2024
Full Text: View/download PDF

189. SaBi3d—A LiDAR Point Cloud Data Set of Car-to-Bicycle Overtaking Maneuvers

Author: Christian Odenwald and Moritz Beeking
Subjects: LiDAR, 3D object detection, bicycle safety, Bibliography. Library science. Information resources
Abstract: While cycling presents environmental benefits and promotes a healthy lifestyle, the risks associated with overtaking maneuvers by motorized vehicles represent a significant barrier for many potential cyclists. A large-scale analysis of overtaking maneuvers could inform traffic researchers and city planners how to reduce these risks by better understanding these maneuvers. Drawing from the fields of sensor-based cycling research and from LiDAR-based traffic data sets, this paper provides a step towards addressing these safety concerns by introducing the Salzburg Bicycle 3d (SaBi3d) data set, which consists of LiDAR point clouds capturing car-to-bicycle overtaking maneuvers. The data set, collected using a LiDAR-equipped bicycle, facilitates the detailed analysis of a large quantity of overtaking maneuvers without the need for manual annotation through enabling automatic labeling by a neural network. Additionally, a benchmark result for 3D object detection using a competitive neural network is provided as a baseline for future research. The SaBi3d data set is structured identically to the nuScenes data set, and therefore offers compatibility with numerous existing object detection systems. This work provides valuable resources for future researchers to better understand cycling infrastructure and mitigate risks, thus promoting cycling as a viable mode of transportation.
Published: 2024
Full Text: View/download PDF

190. Spatial Information Enhancement with Multi-Scale Feature Aggregation for Long-Range Object and Small Reflective Area Object Detection from Point Cloud

Author: Hanwen Li, Huamin Tao, Qiuqun Deng, Shanzhu Xiao, and Jianxiong Zhou
Subjects: 3D object detection, LiDAR, local aggregation operator, autonomous driving, 3D point cloud, Science
Abstract: Accurate and comprehensive 3D objects detection is important for perception systems in autonomous driving. Nevertheless, contemporary mainstream methods tend to perform more effectively on large objects in regions proximate to the LiDAR, leaving limited exploration of long-range objects and small objects. The divergent point pattern of LiDAR, which results in a reduction in point density as the distance increases, leads to a non-uniform point distribution that is ill-suited to discretized volumetric feature extraction. To address this challenge, we propose the Foreground Voxel Proposal (FVP) module, which effectively locates and generates voxels at the foreground of objects. The outputs are subsequently merged to mitigating the difference in point cloud density and completing the object shape. Furthermore, the susceptibility of small objects to occlusion results in the loss of feature space. To overcome this, we propose the Multi-Scale Feature Integration Network (MsFIN), which captures contextual information at different ranges. Subsequently, the outputs of these features are integrated through a cascade framework based on transformers in order to supplement the object features space. The extensive experimental results demonstrate that our network achieves remarkable results. Remarkably, our approach demonstrated an improvement of 8.56% AP on the SECOND baseline for the Car detection task at a distance of more than 20 m, and 9.38% AP on the Cyclist detection task.
Published: 2024
Full Text: View/download PDF

191. Robust BEV 3D Object Detection for Vehicles with Tire Blow-Out

Author: Dongsheng Yang, Xiaojie Fan, Wei Dong, Chaosheng Huang, and Jun Li
Subjects: bird’s-eye view, 3D object detection, transformer, tire blow-out, Chemical technology, TP1-1185
Abstract: The bird’s-eye view (BEV) method, which is a vision-centric representation-based perception task, is essential and promising for future Autonomous Vehicle perception. It has advantages of fusion-friendly, intuitive, end-to-end optimization and is cheaper than LiDAR. The performance of existing BEV methods, however, would be deteriorated under the situation of a tire blow-out. This is because they quite rely on accurate camera calibration which may be disabled by noisy camera parameters during blow-out. Therefore, it is extremely unsafe to use existing BEV methods in the tire blow-out situation. In this paper, we propose a geometry-guided auto-resizable kernel transformer (GARKT) method, which is designed especially for vehicles with tire blow-out. Specifically, we establish a camera deviation model for vehicles with tire blow-out. Then we use the geometric priors to attain the prior position in perspective view with auto-resizable kernels. The resizable perception areas are encoded and flattened to generate BEV representation. GARKT predicts the nuScenes detection score (NDS) with a value of 0.439 on a newly created blow-out dataset based on nuScenes. NDS can still obtain 0.431 when the tire is completely flat, which is much more robust compared to other transformer-based BEV methods. Moreover, the GARKT method has almost real-time computing speed, with about 20.5 fps on one GPU.
Published: 2024
Full Text: View/download PDF

192. IRBEVF-Q: Optimization of Image–Radar Fusion Algorithm Based on Bird’s Eye View Features

Author: Ganlin Cai, Feng Chen, and Ente Guo
Subjects: 3D object detection, multimodal fusion, attention mechanism, query optimization, transformer, Chemical technology, TP1-1185
Abstract: In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly consists of BEV (Bird’s Eye View) fusion coding module and an object decoder module.The BEV fusion coding module solves the problem of unified representation of different modal information by fusing the image and radar features through 3D spatial reference points as a medium. The query in the object decoder, as a core component, plays an important role in detection. In this paper, Heat Map-Guided Query Initialization (HGQI) and Dynamic Position Encoding (DPE) are proposed in query construction to increase the a priori information of the query. The Auxiliary Noise Query (ANQ) then helps to stabilize the matching. The experimental results demonstrate that the proposed fusion model IRBEVF-Q achieves an NDS of 0.575 and a mAP of 0.476 on the nuScenes test set. Compared to recent state-of-the-art methods, our model shows significant advantages, thus indicating that our approach contributes to improving detection accuracy.
Published: 2024
Full Text: View/download PDF

193. LiDAR-Based 3D Temporal Object Detection via Motion-Aware LiDAR Feature Fusion

Author: Gyuhee Park, Junho Koh, Jisong Kim, Jun Moon, and Jun Won Choi
Subjects: 3D object detection, LiDAR, temporal, motion-aware aggregation, autonomous driving, Chemical technology, TP1-1185
Abstract: Recently, the growing demand for autonomous driving in the industry has led to a lot of interest in 3D object detection, resulting in many excellent 3D object detection algorithms. However, most 3D object detectors focus only on a single set of LiDAR points, ignoring their potential ability to improve performance by leveraging the information provided by the consecutive set of LIDAR points. In this paper, we propose a novel 3D object detection method called temporal motion-aware 3D object detection (TM3DOD), which utilizes temporal LiDAR data. In the proposed TM3DOD method, we aggregate LiDAR voxels over time and the current BEV features by generating motion features using consecutive BEV feature maps. First, we present the temporal voxel encoder (TVE), which generates voxel representations by capturing the temporal relationships among the point sets within a voxel. Next, we design a motion-aware feature aggregation network (MFANet), which aims to enhance the current BEV feature representation by quantifying the temporal variation between two consecutive BEV feature maps. By analyzing the differences and changes in the BEV feature maps over time, MFANet captures motion information and integrates it into the current feature representation, enabling more robust and accurate detection of 3D objects. Experimental evaluations on the nuScenes benchmark dataset demonstrate that the proposed TM3DOD method achieved significant improvements in 3D detection performance compared with the baseline methods. Additionally, our method achieved comparable performance to state-of-the-art approaches.
Published: 2024
Full Text: View/download PDF

194. BAFusion: Bidirectional Attention Fusion for 3D Object Detection Based on LiDAR and Camera

Author: Min Liu, Yuanjun Jia, Youhao Lyu, Qi Dong, and Yanyu Yang
Subjects: 3D object detection, LiDAR–camera fusion, cross attention, Chemical technology, TP1-1185
Abstract: 3D object detection is a challenging and promising task for autonomous driving and robotics, benefiting significantly from multi-sensor fusion, such as LiDAR and cameras. Conventional methods for sensor fusion rely on a projection matrix to align the features from LiDAR and cameras. However, these methods often suffer from inadequate flexibility and robustness, leading to lower alignment accuracy under complex environmental conditions. Addressing these challenges, in this paper, we propose a novel Bidirectional Attention Fusion module, named BAFusion, which effectively fuses the information from LiDAR and cameras using cross-attention. Unlike the conventional methods, our BAFusion module can adaptively learn the cross-modal attention weights, making the approach more flexible and robust. Moreover, drawing inspiration from advanced attention optimization techniques in 2D vision, we developed the Cross Focused Linear Attention Fusion Layer (CFLAF Layer) and integrated it into our BAFusion pipeline. This layer optimizes the computational complexity of attention mechanisms and facilitates advanced interactions between image and point cloud data, showcasing a novel approach to addressing the challenges of cross-modal attention calculations. We evaluated our method on the KITTI dataset using various baseline networks, such as PointPillars, SECOND, and Part-A2, and demonstrated consistent improvements in 3D object detection performance over these baselines, especially for smaller objects like cyclists and pedestrians. Our approach achieves competitive results on the KITTI benchmark.
Published: 2024
Full Text: View/download PDF

195. Real-Time Multimodal 3D Object Detection with Transformers

Author: Hengsong Liu and Tongle Duan
Subjects: 3D object detection, LiDAR–camera fusion, transformer, sparse convolutional neural network, Electrical engineering. Electronics. Nuclear engineering, TK1-9971, Transportation engineering, TA1001-1280
Abstract: The accuracy and real-time performance of 3D object detection are key factors limiting its widespread application. While cameras capture detailed color and texture features, they lack depth information compared to LiDAR. Multimodal detection combining both can improve results but incurs significant computational overhead, affecting real-time performance. To address these challenges, this paper presents a real-time multimodal fusion model called Fast Transfusion that combines the benefits of LiDAR and camera sensors and reduces the computational burden of their fusion. Specifically, our Fast Transfusion method uses QConv (Quick Convolution) to replace the convolutional backbones compared to other models. QConv concentrates the convolution operations at the feature map center, where the most information resides, to expedite inference. It also utilizes deformable convolution to better match the actual shapes of detected objects, enhancing accuracy. And the model incorporates EH Decoder (Efficient and Hybrid Decoder) which decouples multiscale fusion into intra-scale interaction and cross-scale fusion, efficiently decoding and integrating features extracted from multimodal data. Furthermore, our proposed semi-dynamic query selection refines the initialization of object queries. On the KITTI 3D object detection dataset, our proposed approach reduced the inference time by 36 ms and improved 3D AP by 1.81% compared to state-of-the-art methods.
Published: 2024
Full Text: View/download PDF

196. Vehicle Behavior Discovery and Three-Dimensional Object Detection and Tracking Based on Spatio-Temporal Dependency Knowledge and Artificial Fish Swarm Algorithm

Author: Yixin Chen and Qingnan Li
Subjects: 3D object detection, 3D object tracking, convolutional neural networks, knowledge-based vehicle behaviors discovery, artificial fish swarm algorithm, Technology
Abstract: In complex traffic environments, 3D target tracking and detection are often occluded by various stationary and moving objects. When the target is occluded, its apparent characteristics change, resulting in a decrease in the accuracy of tracking and detection. In order to solve this problem, we propose to learn the vehicle behavior from the driving data, predict and calibrate the vehicle trajectory, and finally use the artificial fish swarm algorithm to optimize the tracking results. The experiments show that compared with the CenterTrack method, the proposed method improves the key indicators of MOTA (Multi-Object Tracking Accuracy) in 3D object detection and tracking on the nuScenes dataset, and the frame rate is 26 fps.
Published: 2024
Full Text: View/download PDF

197. A Two-Stage 3D Object Detection Algorithm Based on Deep Learning

Author: Luan, Honggang, Gao, Yang, Song, Zengfeng, Zhang, Chuanxi, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lu, Huchuan, editor, Ouyang, Wanli, editor, Huang, Hui, editor, Lu, Jiwen, editor, Liu, Risheng, editor, Dong, Jing, editor, and Xu, Min, editor
Published: 2023
Full Text: View/download PDF

198. Semi-automated Generation of Accurate Ground-Truth for 3D Object Detection

Author: Zwemer, M. H., Scholte, D., de With, P. H. N., Filipe, Joaquim, Editorial Board Member, Ghosh, Ashish, Editorial Board Member, Prates, Raquel Oliveira, Editorial Board Member, Zhou, Lizhu, Editorial Board Member, de Sousa, A. Augusto, editor, Debattista, Kurt, editor, Paljic, Alexis, editor, Ziat, Mounia, editor, Hurter, Christophe, editor, Purchase, Helen, editor, Farinella, Giovanni Maria, editor, Radeva, Petia, editor, and Bouatouch, Kadi, editor
Published: 2023
Full Text: View/download PDF

199. DA-TSD: Double Attention Two-Stage 3D Object Detector from Point Clouds

Author: Zhao, Xinyi, Li, Yong, Tian, Rui, Chen, Yunli, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Iliadis, Lazaros, editor, Papaleonidas, Antonios, editor, Angelov, Plamen, editor, and Jayne, Chrisina, editor
Published: 2023
Full Text: View/download PDF

200. MLF-DET: Multi-Level Fusion for Cross-Modal 3D Object Detection

Author: Lin, Zewei, Shen, Yanqing, Zhou, Sanping, Chen, Shitao, Zheng, Nanning, Goos, Gerhard, Founding Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Iliadis, Lazaros, editor, Papaleonidas, Antonios, editor, Angelov, Plamen, editor, and Jayne, Chrisina, editor
Published: 2023
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,063 results on '"3D object detection"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources