1,064 results on '"3D object detection"'
Search Results
2. MonoDSSMs: Efficient Monocular 3D Object Detection with Depth-Aware State Space Models
- Author
-
Vu, Kiet Dang, Tran, Trung Thai, Nguyen, Duc Dung, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
3. Deformable Shape-Aware Point Generation for 3D Object Detection
- Author
-
Wang, Kai, Zhang, Xiaowei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
4. SeSame: Simple, Easy 3D Object Detection with Point-Wise Semantics
- Author
-
Hayeon, O., Yang, Chanuk, Huh, Kunsoo, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
- Published
- 2025
- Full Text
- View/download PDF
5. Unlocking Textual and Visual Wisdom: Open-Vocabulary 3D Object Detection Enhanced by Comprehensive Guidance from Text and Image
- Author
-
Jiao, Pengkun, Zhao, Na, Chen, Jingjing, Jiang, Yu-Gang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
6. LEROjD: Lidar Extended Radar-Only Object Detection
- Author
-
Palmer, Patrick, Krüger, Martin, Schütte, Stefan, Altendorfer, Richard, Adam, Ganesh, Bertram, Torsten, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
7. RepVF: A Unified Vector Fields Representation for Multi-task 3D Perception
- Author
-
Li, Chunliang, Han, Wencheng, Yin, Junbo, Zhao, Sanyuan, Shen, Jianbing, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Diffusion Model for Robust Multi-sensor Fusion in 3D Object Detection and BEV Segmentation
- Author
-
Le, Duy-Tho, Shi, Hengcan, Cai, Jianfei, Rezatofighi, Hamid, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
9. AF-SSD: Self-attention Fusion Sampling and Fuzzy Classification for Enhanced Small Object Detection
- Author
-
Xiao, He, Jiang, Qingping, Guo, Songhao, Yang, Jiahui, Liu, Qiuming, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hadfi, Rafik, editor, Anthony, Patricia, editor, Sharma, Alok, editor, Ito, Takayuki, editor, and Bai, Quan, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Find n’ Propagate: Open-Vocabulary 3D Object Detection in Urban Environments
- Author
-
Etchegaray, Djamahl, Huang, Zi, Harada, Tatsuya, Luo, Yadan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
11. ASPVNet: Attention Based Sparse Point-Voxel Network for 3D Object Detection
- Author
-
Yu, Bingxin, Wang, Lu, He, Yuhong, Wang, Xiaoyang, Cheng, Jun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Lin, Zhouchen, editor, Cheng, Ming-Ming, editor, He, Ran, editor, Ubul, Kurban, editor, Silamu, Wushouer, editor, Zha, Hongbin, editor, Zhou, Jie, editor, and Liu, Cheng-Lin, editor
- Published
- 2025
- Full Text
- View/download PDF
12. NeRF-MAE: Masked AutoEncoders for Self-supervised 3D Representation Learning for Neural Radiance Fields
- Author
-
Irshad, Muhammad Zubair, Zakharov, Sergey, Guizilini, Vitor, Gaidon, Adrien, Kira, Zsolt, Ambrus, Rares, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
13. Equivariant Spatio-temporal Self-supervision for LiDAR Object Detection
- Author
-
Hegde, Deepti, Lohit, Suhas, Peng, Kuan-Chuan, Jones, Michael J., Patel, Vishal M., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
14. GraphBEV: Towards Robust BEV Feature Alignment for Multi-modal 3D Object Detection
- Author
-
Song, Ziying, Yang, Lei, Xu, Shaoqing, Liu, Lin, Xu, Dongyang, Jia, Caiyan, Jia, Feiyang, Wang, Li, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
15. General Geometry-Aware Weakly Supervised 3D Object Detection
- Author
-
Zhang, Guowen, Fan, Junsong, Chen, Liyi, Zhang, Zhaoxiang, Lei, Zhen, Zhang, Lei, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
16. LaPose: Laplacian Mixture Shape Modeling for RGB-Based Category-Level Object Pose Estimation
- Author
-
Zhang, Ruida, Huang, Ziqin, Wang, Gu, Zhang, Chenyangguang, Di, Yan, Zuo, Xingxing, Tang, Jiwen, Ji, Xiangyang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
17. SimPB: A Single Model for 2D and 3D Object Detection from Multiple Cameras
- Author
-
Tang, Yingqi, Meng, Zhaotie, Chen, Guoliang, Cheng, Erkang, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
18. Detecting as Labeling: Rethinking LiDAR-Camera Fusion in 3D Object Detection
- Author
-
Huang, Junjie, Ye, Yun, Liang, Zhujin, Shan, Yi, Du, Dalong, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
19. Weakly Supervised 3D Object Detection via Multi-level Visual Guidance
- Author
-
Huang, Kuan-Chih, Tsai, Yi-Hsuan, Yang, Ming-Hsuan, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
20. CMD: A Cross Mechanism Domain Adaptation Dataset for 3D Object Detection
- Author
-
Deng, Jinhao, Ye, Wei, Wu, Hai, Huang, Xun, Xia, Qiming, Li, Xin, Fang, Jin, Li, Wei, Wen, Chenglu, Wang, Cheng, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
21. Reg-TTA3D: Better Regression Makes Better Test-Time Adaptive 3D Object Detection
- Author
-
Yuan, Jiakang, Zhang, Bo, Gong, Kaixiong, Yue, Xiangyu, Shi, Botian, Qiao, Yu, Chen, Tao, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
22. SparseLIF: High-Performance Sparse LiDAR-Camera Fusion for 3D Object Detection
- Author
-
Zhang, Hongcheng, Liang, Liu, Zeng, Pengxin, Song, Xiao, Wang, Zhe, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
23. Global-Local Collaborative Inference with LLM for Lidar-Based Open-Vocabulary Detection
- Author
-
Peng, Xingyu, Bai, Yan, Gao, Chen, Yang, Lirong, Xia, Fei, Mu, Beipeng, Wang, Xiaofei, Liu, Si, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
24. 9-Meter-Long 3d Ultrasonic Objects Detection via Packaged Lithium-Niobate PMUTs
- Author
-
Peng, Yande, Liu, Hanxiao, Chen, Chun-Ming, Yue, Wei, Teng, Megan, Tsao, Pei-Chi, Umezawa, Seiji, Ikeuchi, Shinsuke, Aida, Yasuhiro, and Lin, Liwei
- Subjects
Data Management and Data Science ,Information and Computing Sciences ,PMUTs ,3D object detection ,long range sensing ,machine vision - Abstract
This paper reports a 9-meter-long ultrasonic 3D detector based on a packaged lithium niobate PMUTs (piezoelectric micromachined ultrasonic transducers). Compared with the state-of-the-art reports, three distinctive achievements have been demonstrated: (1) high uniformity and wide bandwidth PMUTs by optimized package designs for highly efficient ultrasonic energy transfer; (2) a long-range receiving beamforming detection scheme on a 4×4 PMUT array for up to 9 m detection rang - comparable to the longest reported range via PMUTs; and (3) 3D detection of multiple static/moving objects with the field of view exceeding 50°. As such, this device is valuable for various applications such as obstacle avoidance when both low power consumption and small form factor are desirable, including aerial drones.
- Published
- 2024
25. A novel approach to sustainable behavior enhancement through AI-driven carbon footprint assessment and real-time analytics.
- Author
-
Jasmy, Ahmad Jasim, Ismail, Heba, and Aljneibi, Noof
- Subjects
REWARD (Psychology) ,SUSTAINABILITY ,SUSTAINABLE communities ,OBJECT recognition (Computer vision) ,ECOLOGICAL impact - Abstract
This research introduces an Artificial Intelligence-driven mobile application designed to help users calculate and reduce their Carbon Footprint (CFP). The proposed system employs an Intelligent Sustainable Behavior Tracking and Recommendation System, analyzing users' carbon emissions from daily activities and suggesting eco-friendly alternatives. It facilitates sustainability discussions through its chat community and educates users on sustainable practices via an intelligent chatbot powered by a sustainability knowledge base. To promote social engagement around sustainability, the application incorporates a competition and reward system. Additionally, it aggregates behavioral data to inform government sustainability policies and address challenges. Emphasizing individual responsibility, the proposed system stands out from other systems by offering a comprehensive solution that integrates recommendation, education, monitoring, and community engagement, contributing to the cultivation of sustainable communities. The results of a user study (n = 10) employing paired sample t-tests across the three dimensions of the Theory of Reasoned Action (TRA) revealed varying effects of using the application on attitudes, subjective norms, and behavioral intentions related to promoting sustainable human behavior. While the application did not yield significant changes in attitudes (t (9) = 1.7, p = 0.123), or behavioral intentions (t (9) = 0.6, p = 0.541), it did produce a significant increase in subjective norms (t (9) = 4.2, p = 0.002). This suggests that while attitudes towards using this application for sustainability and behavioral intentions remained relatively stable, there was a notable impact on the perception of social influence to engage in sustainable behavior through the use of the application attributed to the sustainability reward system. Article Highlights: This study is the first to track individual CO2 emissions in real-time, contributing valuable insights to sustainability in UAE. The app successfully increased users' awareness of their environmental impact, promoting sustainable behaviors. T-test results show a significant increase in social influence to engage in sustainable practices (t(9) = 4.2, p = 0.002). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. CaLiJD: Camera and LiDAR Joint Contender for 3D Object Detection.
- Author
-
Lyu, Jiahang, Qi, Yongze, You, Suilian, Meng, Jin, Meng, Xin, Kodagoda, Sarath, and Wang, Shifeng
- Abstract
Three-dimensional object detection has been a key area of research in recent years because of its rich spatial information and superior performance in addressing occlusion issues. However, the performance of 3D object detection still lags significantly behind that of 2D object detection, owing to challenges such as difficulties in feature extraction and a lack of texture information. To address this issue, this study proposes a 3D object detection network, CaLiJD (Camera and Lidar Joint Contender for 3D object Detection), guided by two-dimensional detection results. CaLiJD creatively integrates advanced channel attention mechanisms with a novel bounding-box filtering method to improve detection accuracy, especially for small and occluded objects. Bounding boxes are detected by the 2D and 3D networks for the same object in the same scene as an associated pair. The detection results that satisfy the criteria are then fed into the fusion layer for training. In this study, a novel fusion network is proposed. It consists of numerous convolutions arranged in both sequential and parallel forms and includes a Grouped Channel Attention Module for extracting interactions among multi-channel information. Moreover, a novel bounding-box filtering mechanism was introduced, incorporating the normalized distance from the object to the radar as a filtering criterion within the process. Experiments were conducted using the KITTI 3D object detection benchmark. The results showed that a substantial improvement in mean Average Precision (mAP) was achieved by CaLiJD compared with the baseline single-modal 3D detection model, with an enhancement of 7.54%. Moreover, the improvement achieved by our method surpasses that of other classical fusion networks by an additional 0.82%. In particular, CaLiJD achieved mAP values of 73.04% and 59.86%, respectively, thus demonstrating state-of-the-art performance for challenging small-object detection tasks such as those involving cyclists and pedestrians. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Contextual Attribution Maps-Guided Transferable Adversarial Attack for 3D Object Detection.
- Author
-
Cai, Mumuxin, Wang, Xupeng, Sohel, Ferdous, and Lei, Hang
- Abstract
The study of LiDAR-based 3D object detection and its robustness under adversarial attacks has achieved great progress. However, existing adversarial attack methods mainly focus on the targeted object, which destroys the integrity of the object and makes the attack easy to perceive. In this work, we propose a novel adversarial attack against deep 3D object detection models named the contextual attribution maps-guided attack (CAMGA). Based on the combinations of subregions in the context area and their impact on the prediction results, contextual attribution maps can be generated. An attribution map exposes the influence of individual subregions in the context area on the detection results and narrows down the scope of the adversarial attack. Subsequently, perturbations are generated under the guidance of a dual loss, which is proposed to suppress the detection results and maintain visual imperception simultaneously. The experimental results proved that the CAMGA method achieved an attack success rate of over 68% on three large-scale datasets and 83% on the KITTI dataset. Meanwhile, the CAMGA has a transfer attack success rate of at least 50% against all four victim detectors, as they all overly rely on contextual information. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. MMDistill: Multi-Modal BEV Distillation Framework for Multi-View 3D Object Detection.
- Author
-
Jiao, Tianzhe, Chen, Yuming, Zhang, Zhe, Guo, Chaopeng, and Song, Jie
- Abstract
Multi-modal 3D object detection has achieved remarkable progress, but it is often limited in practical industrial production because of its high cost and low efficiency. The multi-view camera-based method provides a feasible solution due to its low cost. However, camera data lacks geometric depth, and only using camera data to obtain high accuracy is challenging. This paper proposes a multi-modal Bird-Eye-View (BEV) distillation framework (MMDistill) to make a trade-off between them. MMDistill is a carefully crafted two-stage distillation framework based on teacher and student models for learning cross-modal knowledge and generating multi-modal features. It can improve the performance of unimodal detectors without introducing additional costs during inference. Specifically, our method can effectively solve the cross-gap caused by the heterogeneity between data. Furthermore, we further propose a Light Detection and Ranging (LiDAR)-guided geometric compensation module, which can assist the student model in obtaining effective geometric features and reduce the gap between different modalities. Our proposed method generally requires fewer computational resources and faster inference speed than traditional multi-modal models. This advancement enables multi-modal technology to be applied more widely in practical scenarios. Through experiments, we validate the effectiveness and superiority of MMDistill on the nuScenes dataset, achieving an improvement of 4.1% mean Average Precision (mAP) and 4.6% NuScenes Detection Score (NDS) over the baseline detector. In addition, we also present detailed ablation studies to validate our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. An Adaptive Multimodal Fusion 3D Object Detection Algorithm for Unmanned Systems in Adverse Weather.
- Author
-
Wang, Shenyu, Xie, Xinlun, Li, Mingjiang, Wang, Maofei, Yang, Jinming, Li, Zeming, Zhou, Xuehua, and Zhou, Zhiguo
- Abstract
Unmanned systems encounter challenging weather conditions during obstacle removal tasks. Researching stable, real-time, and accurate environmental perception methods under such conditions is crucial. Cameras and LiDAR sensors provide different and complementary data. However, the integration of disparate data presents challenges such as feature mismatches and the fusion of sparse and dense information, which can degrade algorithmic performance. Adverse weather conditions, like rain and snow, introduce noise that further reduces perception accuracy. To address these issues, we propose a novel weather-adaptive bird's-eye view multi-level co-attention fusion 3D object detection algorithm (BEV-MCAF). This algorithm employs an improved feature extraction network to obtain more effective features. A multimodal feature fusion module has been constructed with BEV image feature generation and a co-attention mechanism for better fusion effects. A multi-scale multimodal joint domain adversarial network (M2-DANet) is proposed to enhance adaptability to adverse weather conditions. The efficacy of BEV-MCAF has been validated on both the nuScenes and Ithaca365 datasets, confirming its robustness and good generalization capability in a variety of bad weather conditions. The findings indicate that our proposed algorithm performs better than the benchmark, showing improved adaptability to harsh weather conditions and enhancing the robustness of UVs, ensuring reliable perception under challenging conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. High‐order multilayer attention fusion network for 3D object detection.
- Author
-
Zhang, Baowen, Zhao, Yongyong, Su, Chengzhi, and Cao, Guohua
- Abstract
Three‐dimensional object detection based on the fusion of 2D image data and 3D point clouds has become a research hotspot in the field of 3D scene understanding. However, different sensor data have discrepancies in spatial position, scale, and alignment, which severely impact detection performance. Inappropriate fusion methods can lead to the loss and interference of valuable information. Therefore, we propose the High‐Order Multi‐Level Attention Fusion Network (HMAF‐Net), which takes camera images and voxelized point clouds as inputs for 3D object detection. To enhance the expressive power between different modality features, we introduce a high‐order feature fusion module that performs multi‐level convolution operations on the element‐wise summed features. By incorporating filtering and non‐linear activation, we extract deep semantic information from the fused multi‐modal features. To maximize the effectiveness of the fused salient feature information, we introduce an attention mechanism that dynamically evaluates the importance of pooled features at each level, enabling adaptive weighted fusion of significant and secondary features. To validate the effectiveness of HMAF‐Net, we conduct experiments on the KITTI dataset. In the "Car," "Pedestrian," and "Cyclist" categories, HMAF‐Net achieves mAP performances of 81.78%, 60.09%, and 63.91%, respectively, demonstrating more stable performance compared to other multi‐modal methods. Furthermore, we further evaluate the framework's effectiveness and generalization capability through the KITTI benchmark test, and compare its performance with other published detection methods on the 3D detection benchmark and BEV detection benchmark for the "Car" category, showing excellent results. The code and model will be made available on https://github.com/baowenzhang/High‐order‐Multilayer‐Attention‐Fusion‐Network‐for‐3D‐Object‐Detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Semi-Supervised Online Continual Learning for 3D Object Detection in Mobile Robotics.
- Author
-
Liu, Binhong, Yao, Dexin, Yang, Rui, Yan, Zhi, and Yang, Tao
- Abstract
Continual learning addresses the challenge of acquiring and retaining knowledge over time across multiple tasks and environments. Previous research primarily focuses on offline settings where models learn through increasing tasks from samples paired with ground truth annotations. In this work, we focus on an unsolved, challenging, yet practical scenario, specifically, the semi-supervised online continual learning in autonomous driving and mobile robotics. In our settings, models are tasked with learning new distributions from streaming unlabeled samples and performing 3D object detection as soon as the LiDAR point cloud arrives. Additionally, we conducted experiments on both the KITTI dataset, our newly built IUSL dataset and Canadian Adverse Driving Conditions (CADC) Dataset. The results indicate that our method achieves a balance between rapid adaptation and knowledge retention, showcasing its effectiveness in the dynamic and complex environment of autonomous driving and mobile robotics. The developed ROS packages and IUSL dataset will be publicly available at: . [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Enhancing urban landscape analysis through combined LiDAR and visual image data preprocessing.
- Author
-
Saravanarajan, Vani Suthamathi, Chen, Rung-Ching, and Manongga, William Eric
- Abstract
The fusion of LiDAR and visual image data reshapes urban landscape analysis, underpinning applications in urban planning, infrastructure development, and environmental monitoring. This paper delves into the pivotal realm of combined LiDAR and visual image data preprocessing, which lays the foundation for accurate and meaningful urban landscape analysis. In this study, we explore the intricate process of harmonizing LiDAR's precise 3D geometry with the rich visual context offered by images. By devising methodologies for seamless data fusion, we navigate the challenges of coordinate alignment, calibration, and feature extraction. The calibrated integration yields a robust dataset for advanced analysis. Our research highlights the transformative potential of this preprocessing strategy, presenting novel findings that include the development of a GFLN (Geometric Feature Learning Network) model for addressing challenges posed by unstructured LiDAR point clouds. Acting as a quality control mechanism, the GFLN model further advances the field by enhancing the accuracy and reliability of urban landscape analysis. Our proposed GNLN method performs better than previous methods by achieving a mAcc of 79.19%. GFLN also took significantly less time to train, with only 145 s. From 3D feature extraction to urban change detection, this framework empowers diverse analytical avenues. Leveraging the synergy of LiDAR and visual data, this study invites practitioners and researchers to embrace an enriched toolkit for holistic urban landscape analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. CL-fusionBEV: 3D object detection method with camera-LiDAR fusion in Bird's Eye View.
- Author
-
Shi, Peicheng, Liu, Zhiqiang, Dong, Xinlong, and Yang, Aixi
- Subjects
OBJECT recognition (Computer vision) ,CARTESIAN coordinates ,IMPLICIT learning ,MULTISENSOR data fusion ,AUTONOMOUS vehicles - Abstract
In the wave of research on autonomous driving, 3D object detection from the Bird's Eye View (BEV) perspective has emerged as a pivotal area of focus. The essence of this challenge is the effective fusion of camera and LiDAR data into the BEV. Current approaches predominantly train and predict within the front view and Cartesian coordinate system, often overlooking the inherent structural and operational differences between cameras and LiDAR sensors. This paper introduces CL-FusionBEV, an innovative 3D object detection methodology tailored for sensor data fusion in the BEV perspective. Our approach initiates with a view transformation, facilitated by an implicit learning module that transitions the camera's perspective to the BEV space, thereby aligning the prediction module. Subsequently, to achieve modal fusion within the BEV framework, we employ voxelization to convert the LiDAR point cloud into BEV space, thereby generating LiDAR BEV spatial features. Moreover, to integrate the BEV spatial features from both camera and LiDAR, we have developed a multi-modal cross-attention mechanism and an implicit multi-modal fusion network, designed to enhance the synergy and application of dual-modal data. To counteract potential deficiencies in global reasoning and feature interaction arising from multi-modal cross-attention, we propose a BEV self-attention mechanism that facilitates comprehensive global feature operations. Our methodology has undergone rigorous evaluation on a substantial dataset within the autonomous driving domain, the nuScenes dataset. The outcomes demonstrate that our method achieves a mean Average Precision (mAP) of 73.3% and a nuScenes Detection Score (NDS) of 75.5%, particularly excelling in the detection of cars and pedestrians with high accuracies of 89% and 90.7%, respectively. Additionally, CL-FusionBEV exhibits superior performance in identifying occluded and distant objects, surpassing existing comparative methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Ship detection and water level measurement system based on 3D point cloud.
- Author
-
Chen, Yue, Li, Zhaochun, Huang, Lei, and Cheng, Yuzhu
- Subjects
- *
OBJECT recognition (Computer vision) , *WATER levels , *MEASUREMENT errors , *POINT cloud , *EXTRACTION techniques - Abstract
With the continuous development of the water transportation and shipping industries, the number of ships in rivers has steadily multiplied, followed by the increasing complexity of the ship routes. These changes have highlighted the growing importance of ship detection and water level measurement systems. Such systems not only enhance the management efficiency of waterborne traffic, ensure navigation safety, and reduce congestion and collision accidents, but also effectively safeguard the integrity of riverside and bridge structures. Ship detection and river elevation measurement based on 3D point clouds can directly acquire depth information without being affected by lighting conditions. So it has a good research prospect. Therefore, this paper proposes a novel ship detection algorithm based on improved PointRCNN and a novel method for riverbank line extraction and water level measurement based on 3D point clouds, respectively. For ship detection, the improved PointRCNN algorithm can increase the performance of data processing and keypoint extraction techniques, and make the network to keep more foreground point clouds and learn more effective features. This improves the recognition capability of distant ships. Compared to the original PointRCNN algorithm, the improved PointRCNN algorithm has achieved a 3.84% increase in detection precision in practical scenarios. Regarding riverbank extraction and water level measurement, the proposed method based on 3D point clouds can directly extract riverbank lines with depth information, obtaining water level height without direct contact with the river surface. Within a distance range between 15 and 45 m from the LiDAR, the average absolute error using this measurement method is less than 5 cm, demonstrating the good detection accuracy of this method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. MS3D: A Multi-Scale Feature Fusion 3D Object Detection Method for Autonomous Driving Applications.
- Author
-
Li, Ying, Zhuang, Wupeng, and Yang, Guangsong
- Subjects
OBJECT recognition (Computer vision) ,CONVOLUTIONAL neural networks ,POINT cloud ,AUTONOMOUS vehicles ,LIDAR - Abstract
With advancements in autonomous driving, LiDAR has become central to 3D object detection due to its precision and interference resistance. However, challenges such as point cloud sparsity and unstructured data persist. This study introduces MS3D (Multi-Scale Feature Fusion 3D Object Detection Method), a novel approach to 3D object detection that leverages the architecture of a 2D Convolutional Neural Network (CNN) as its core framework. It integrates a Second Feature Pyramid Network to enhance multi-scale feature representation and contextual integration. The Adam optimizer is employed for efficient adaptive parameter tuning, significantly improving detection performance. On the KITTI dataset, MS3D achieves average precisions of 93.58%, 90.91%, and 88.46% in easy, moderate, and hard scenarios, respectively, surpassing state-of-the-art models like VoxelNet, SECOND, and PointPillars. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Semantics-Fusion: Radar Semantic Information-Based Radar–Camera Fusion for 3D Object Detection.
- Author
-
Tian, Ziran, Huang, Xiaohong, Xu, Kunqiang, Sun, Xujie, and Deng, Zhenmiao
- Subjects
- *
OBJECT recognition (Computer vision) , *FEATURE extraction , *POINT cloud , *DEEP learning , *RADAR - Abstract
The fusion of millimeter-wave radar and camera for three-dimensional (3D) object detection represents a pivotal technology for autonomous driving, yet it is not without its inherent challenges. First, the radar point cloud contains clutter, which can result in the generation of impure radar features. Second, the radar point cloud is sparse, which presents a challenge in fully extracting the radar features. This can result in the loss of object information, leading to object misdetection, omission, and a reduction in the robustness. To address these issues, a 3D object detection method based on the semantic information of radar features and camera fusion (Semantics-Fusion) is proposed. Initially, the image features are extracted through the centroid detection network, resulting in the generation of a preliminary 3D bounding box for the objects. Subsequently, the radar point cloud is clustered based on the objects' position and velocity, thereby eliminating irrelevant point cloud and clutter. The clustered radar point cloud is projected onto the image plane, thereby forming a radar 2D pseudo-image. This is then input to the designed 2D convolution module, which enables the full extraction of the semantic information of the radar features. Ultimately, the radar features are fused with the image features, and secondary regression is employed to achieve robust 3D object detection. The performance of our method was evaluated on the nuScenes dataset, achieving a mean average precision (mAP) of 0.325 and a nuScenes detection score (NDS) of 0.462. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. PLC-Fusion: Perspective-Based Hierarchical and Deep LiDAR Camera Fusion for 3D Object Detection in Autonomous Vehicles.
- Author
-
Mushtaq, Husnain, Deng, Xiaoheng, Azhar, Fizza, Ali, Mubashir, and Raza Sherazi, Hafiz Husnain
- Subjects
- *
OBJECT recognition (Computer vision) , *TRANSFORMER models , *POINT cloud , *DEEP learning , *LIDAR - Abstract
Accurate 3D object detection is essential for autonomous driving, yet traditional LiDAR models often struggle with sparse point clouds. We propose perspective-aware hierarchical vision transformer-based LiDAR-camera fusion (PLC-Fusion) for 3D object detection to address this. This efficient, multi-modal 3D object detection framework integrates LiDAR and camera data for improved performance. First, our method enhances LiDAR data by projecting them onto a 2D plane, enabling the extraction of object perspective features from a probability map via the Object Perspective Sampling (OPS) module. It incorporates a lightweight perspective detector, consisting of interconnected 2D and monocular 3D sub-networks, to extract image features and generate object perspective proposals by predicting and refining top-scored 3D candidates. Second, it leverages two independent transformers—CamViT for 2D image features and LidViT for 3D point cloud features. These ViT-based representations are fused via the Cross-Fusion module for hierarchical and deep representation learning, improving performance and computational efficiency. These mechanisms enhance the utilization of semantic features in a region of interest (ROI) to obtain more representative point features, leading to a more effective fusion of information from both LiDAR and camera sources. PLC-Fusion outperforms existing methods, achieving a mean average precision (mAP) of 83.52% and 90.37% for 3D and BEV detection, respectively. Moreover, PLC-Fusion maintains a competitive inference time of 0.18 s. Our model addresses computational bottlenecks by eliminating the need for dense BEV searches and global attention mechanisms while improving detection range and precision. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.
- Author
-
Huang, Fei, Liu, Shengshu, Zhang, Guangqian, Hao, Bingsen, Xiang, Yangkai, and Yuan, Kun
- Subjects
- *
OBJECT recognition (Computer vision) , *MULTISENSOR data fusion , *FEATURE extraction , *TRANSFORMER models , *POINT cloud - Abstract
To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird's-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. FS-3DSSN: an efficient few-shot learning for single-stage 3D object detection on point clouds.
- Author
-
Tiwari, Alok Kumar and Sharma, G. K.
- Subjects
- *
OBJECT recognition (Computer vision) , *POLICE vehicles , *POINT cloud , *DETECTORS , *LEARNING modules - Abstract
The current 3D object detection methods have achieved promising results for conventional tasks to detect frequently occurring objects like cars, pedestrians and cyclists. However, they require many annotated boundary boxes and class labels for training, which is very expensive and hard to obtain. Nevertheless, detecting infrequent occurring objects, such as police vehicles, is also essential for autonomous driving to be successful. Therefore, we explore the potential of few-shot learning to handle this challenge of detecting infrequent categories. The current 3D object detectors do not have the necessary architecture to support this type of learning. Thus, this paper presents a new method termed few-shot single-stage network for 3D object detection (FS-3DSSN) to predict infrequent categories of objects. FS-3DSSN uses a class-incremental few-shot learning approach to detect infrequent categories without compromising the detection accuracy of frequent categories. It consists of two modules: (i) a single-stage network architecture for 3D object detection (3DSSN) using deformable convolutions to detect small objects and (ii) a class-incremental-based meta-learning module to learn and predict infrequent class categories. 3DSSN obtained 84.53 mAP 3D on the KITTI car category and 73.4 NDS on the nuScenes dataset, outperforming previous state of the art. Further, the result of FS-3DSSN on nuScenes is also encouraging for detecting infrequent categories while maintaining accuracy in frequent classes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. 基于双流特征提取的车路协同感知方法.
- Author
-
牛国臣, 孙翔宇, and 苑峥岩
- Subjects
OBJECT recognition (Computer vision) ,TRAFFIC monitoring ,DATA mining ,ROADSIDE improvement ,PROBLEM solving ,AUTONOMOUS vehicles - Abstract
Copyright of Journal of Shanghai Jiao Tong University (1006-2467) is the property of Journal of Shanghai Jiao Tong University Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
41. Radar-camera fusion for 3D object detection with aggregation transformer.
- Author
-
Li, Jun, Zhang, Han, Wu, Zizhang, and Xu, Tianhao
- Subjects
OBJECT recognition (Computer vision) ,TRANSFORMER models ,AUTONOMOUS vehicles ,MONOCULARS ,CAMERAS - Abstract
In recent years, with the continuous development of autonomous driving, monocular 3D object detection has garnered increasing attention as a crucial research topic. However, the precision of 3D object detection is impeded by the limitations of monocular camera sensors, which struggle to capture accurate depth information. To address this challenge, a novel Aggregation Transformer Network (ATNet) is introduced, featuring Cross-Attention based Positional Aggregation and Dual Expansion-Squeeze based Channel Aggregation. The proposed ATNet adaptively fuses radar and camera data at both positional and channel levels. Specifically, the Cross-Attention based Positional Aggregation leverages camera-radar information to compute a non-linear attention coefficient, which reinforces salient features and suppresses irrelevant ones. The Dual Expansion-Squeeze based Channel Aggregation utilizes refined processing techniques to integrate radar and camera data adaptively at the channel level. Furthermore, to enhance feature-level fusion, we propose a multi-scale radar-camera fusion strategy that integrates radar information across multiple stages of the camera subnet's backbone, allowing for improved object detection across various scales. Extensive experiments conducted on the widely-used nuScenes dataset validate that our proposed Aggregation Transformer, when integrated into superb monocular 3D object detection models, delivers promising results compared to existing methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Rethinking the Non-Maximum Suppression Step in 3D Object Detection from a Bird's-Eye View.
- Author
-
Li, Bohao, Song, Shaojing, and Ai, Luxia
- Subjects
OBJECT recognition (Computer vision) ,ALGORITHMS ,FORECASTING - Abstract
In camera-based bird's-eye view (BEV) 3D object detection, non-maximum suppression (NMS) plays a crucial role. However, traditional NMS methods become ineffective in BEV scenarios where the predicted bounding boxes of small object instances often have no overlapping areas. To address this issue, this paper proposes a BEV intersection over union (IoU) computation method based on relative position and absolute spatial information, referred to as B-IoU. Additionally, a BEV circular search method, called B-Grouping, is introduced to handle prediction boxes of varying scales. Utilizing these two methods, a novel NMS strategy called BEV-NMS is developed to handle the complex prediction boxes in BEV perspectives. This BEV-NMS strategy is implemented in several existing algorithms. Based on the results from the nuScenes validation set, there was an average increase of 7.9% in mAP when compared to the strategy without NMS. The NDS also showed an average increase of 7.9% under the same comparison. Furthermore, compared to the Scale-NMS strategy, the mAP increased by an average of 3.4%, and the NDS saw an average improvement of 3.1%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. EFMF-pillars: 3D object detection based on enhanced features and multi-scale fusion.
- Author
-
Zhang, Wenbiao, Chen, Gang, Wang, Hongyan, Yang, Lina, and Sun, Tao
- Subjects
OBJECT recognition (Computer vision) ,FEATURE extraction ,TRAFFIC safety ,AUTONOMOUS vehicles ,ALGORITHMS - Abstract
As unmanned vehicle technology advances rapidly, obstacle recognition and target detection are crucial links, which directly affect the driving safety and efficiency of unmanned vehicles. In response to the inaccurate localization of small targets such as pedestrians in current object detection tasks and the problem of losing local features in the PointPillars, this paper proposes a three-dimensional object detection method based on improved PointPillars. Firstly, addressing the issue of lost spatial and local information in the PointPillars, the feature encoding part of the PointPillars is improved, and a new pillar feature enhancement extraction module, CSM-Module, is proposed. Channel encoding and spatial encoding are introduced in the new pillar feature enhancement extraction module, fully considering the spatial information and local detailed geometric information of each pillar, thereby enhancing the feature representation capability of each pillar. Secondly, based on the fusion of CSPDarknet and SENet, a new backbone network CSE-Net is designed in this paper, enabling the extraction of rich contextual semantic information and multi-scale global features, thereby enhancing the feature extraction capability. Our method achieves higher detection accuracy when validated on the KITTI dataset. Compared to the original network, the improved algorithm's average detection accuracy is increased by 3.42%, it shows that the method is reasonable and valuable. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Enhancing 3D object detection through multi-modal fusion for cooperative perception.
- Author
-
Xia, Bin, Zhou, Jun, Kong, Fanyu, You, Yuhe, Yang, Jiarui, and Lin, Lin
- Subjects
OBJECT recognition (Computer vision) ,ARTIFICIAL intelligence ,POINT cloud ,DEEP learning ,AUTONOMOUS vehicles - Abstract
Fueled by substantial advancements in deep learning, the domain of autonomous driving is swiftly advancing towards more robust and effective intelligent systems. One of the critical challenges in this field is achieving accurate 3D object detection, which is often hindered by data sparsity and occlusion. To address these issues, we propose a method centered around a multi-modal fusion strategy that leverages vehicle-road cooperation to enhance perception capabilities. Our approach integrates label information from roadside perception point clouds to harmonize and enrich the representation of image and LiDAR data. This comprehensive integration significantly improves detection accuracy by providing a fuller understanding of the surrounding environment. Rigorous evaluations of our proposed method on two benchmark datasets, KITTI and Waymo Open, demonstrate its superior performance, with our model achieving 87.52% 3D Average Precision (3D AP) and 93.71% Bird's Eye View Average Precision (BEV AP) on the KITTI v a l set. These results highlight the effectiveness of our method in detecting sparse and distant objects, contributing to the development of safer and more efficient autonomous driving solutions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. CPU 环境下多传感器数据融合的机器人 3D 目标检测方法.
- Author
-
楼 进, 刘恩博, 唐 炜, and 张仁远
- Subjects
OBJECT recognition (Computer vision) ,OPTICAL radar ,MULTISENSOR data fusion ,MOBILE robots ,DATA mining - Abstract
Copyright of Journal of Computer Engineering & Applications is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
46. DMFusion: LiDAR-camera fusion framework with depth merging and temporal aggregation.
- Author
-
Yu, Xinyi, Lu, Ke, Yang, Yang, and Ou, Linlin
- Subjects
OBJECT recognition (Computer vision) ,THREE-dimensional imaging ,POINT cloud ,AUTONOMOUS vehicles ,PROBLEM solving - Abstract
Multimodal 3D object detection is an active research topic in the field of autonomous driving. Most existing methods utilize both camera and LiDAR modalities but fuse their features through simple and insufficient mechanisms. Additionally, these approaches lack reliable positional and temporal information due to their reliance on single-frame camera data. In this paper, a novel end-to-end framework for 3D object detection was proposed to solve these problems through spatial and temporal fusion. The spatial information of bird's-eye view (BEV) features is enhanced by integrating depth features from point clouds during the conversion of image features into 3D space. Moreover, positional and temporal information is augmented by aggregating multi-frame features. This framework is named as DMFusion, which consists of the following components: (i) a novel depth fusion view transform module (referred to as DFLSS), (ii) a simple and easily adjustable temporal fusion module based on 3D convolution (referred to as 3DMTF), and (iii) a LiDAR-temporal fusion module based on channel attention mechanism. On the nuScenes benchmark, DMFusion improves mAP by 1.42% and NDS by 1.26% compared with the baseline model, which demonstrates the effectiveness of our proposed method. The code will be released at https://github.com/lilkeker/DMFusion. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Geometric relation-based feature aggregation for 3D small object detection.
- Author
-
Yang, Wenbin, Yu, Hang, Luo, Xiangfeng, and Xie, Shaorong
- Subjects
OBJECT recognition (Computer vision) ,CARGO ships ,POINT cloud ,AUTONOMOUS vehicles ,SPEED - Abstract
Point cloud-based 3D small object detection is crucial for autonomous driving and smart ships. The current 3D object detection mainly relies on object global features derived from 3D and 2D convolutional networks, inevitably leading to the loss of substantial object detail information. As large objects contain enough points to obtain sufficient global features, they are easy to identify. In contrast, small objects contain fewer points and their global features are weak, resulting in false classifications and inaccurate location estimates. Therefore, it is necessary to take into account the local geometric relation features of the object to develop adequate discriminative features. Moreover, the current two-stage 3D object detection speed is relatively slow due to the complex refinement structure, which is adverse to real-time detection. In this paper, an efficient 3D small object detection network with two novel modules is proposed. Firstly, the Geometric relation-based Feature Aggregation (GFA) module is designed to improve small object detection performance. This module flexibly aggregates the features of voxels and original points near the key points, for key points to aggregate more local discriminate features of objects, which is conducive to small object detection. Subsequently, the Key point Feature Abstraction (KFA) module is designed to improve the speed of small object detection, through which object global features can be rapidly obtained and the detection performance can be enhanced. Experimental results show that this method achieves state-of-the-art small object detection performance on both the KITTI dataset and the River Cargo Ship dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. 基于BEV 视角的多传感融合3D 目标检测.
- Author
-
张津, 朱冯慧, 王秀丽, and 朱威
- Abstract
Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
49. Three-Dimensional Outdoor Object Detection in Quadrupedal Robots for Surveillance Navigations.
- Author
-
Tanveer, Muhammad Hassan, Fatima, Zainab, Mariam, Hira, Rehman, Tanazzah, and Voicu, Razvan Cristian
- Subjects
OBJECT recognition (Computer vision) ,DRIVERLESS cars ,POINT cloud ,AUTONOMOUS vehicles ,DETECTORS - Abstract
Quadrupedal robots are confronted with the intricate challenge of navigating dynamic environments fraught with diverse and unpredictable scenarios. Effectively identifying and responding to obstacles is paramount for ensuring safe and reliable navigation. This paper introduces a pioneering method for 3D object detection, termed viewpoint feature histograms, which leverages the established paradigm of 2D detection in projection. By translating 2D bounding boxes into 3D object proposals, this approach not only enables the reuse of existing 2D detectors but also significantly increases the performance with less computation required, allowing for real-time detection. Our method is versatile, targeting both bird's eye view objects (e.g., cars) and frontal view objects (e.g., pedestrians), accommodating various types of 2D object detectors. We showcase the efficacy of our approach through the integration of YOLO3D, utilizing LiDAR point clouds on the KITTI dataset, to achieve real-time efficiency aligned with the demands of autonomous vehicle navigation. Our model selection process, tailored to the specific needs of quadrupedal robots, emphasizes considerations such as model complexity, inference speed, and customization flexibility, achieving an accuracy of up to 99.93%. This research represents a significant advancement in enabling quadrupedal robots to navigate complex and dynamic environments with heightened precision and safety. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. MSSA: Multi-Representation Semantics-Augmented Set Abstraction for 3D Object Detection.
- Author
-
Liu, Huaijin, Du, Jixiang, Zhang, Yong, Zhang, Hongbo, and Zeng, Jiandian
- Subjects
OBJECT recognition (Computer vision) ,POINT cloud ,COMPUTER vision ,POINT processes ,SPINE - Abstract
Accurate recognition and localization of 3D objects is a fundamental research problem in 3D computer vision. Benefiting from transformation-free point cloud processing and flexible receptive fields, point-based methods have become accurate in 3D point cloud modeling, but still fall behind voxel-based competitors in 3D detection. We observe that the set abstraction module, commonly utilized by point-based methods for downsampling points, tends to retain excessive irrelevant background information, thus hindering the effective learning of features for object detection tasks. To address this issue, we propose MSSA, a Multi-representation Semantics-augmented Set Abstraction for 3D object detection. Specifically, we first design a backbone network to encode different representation features of point clouds, which extracts point-wise features through PointNet to preserve fine-grained geometric structure features, and adopts VoxelNet to extract voxel features and BEV features to enhance the semantic features of key points. Second, to efficiently fuse different representation features of keypoints, we propose a Point feature-guided Voxel feature and BEV feature fusion (PVB-Fusion) module to adaptively fuse multi-representation features and remove noise. At last, a novel Multi-representation Semantic-guided Farthest Point Sampling (MS-FPS) algorithm is designed to help set abstraction modules progressively downsample point clouds, thereby improving instance recall and detection performance with more important foreground points. We evaluate MSSA on the widely used KITTI dataset and the more challenging nuScenes dataset. Experimental results show that compared to PointRCNN, our method improves the AP of "moderate" level for three classes of objects by 7.02%, 6.76%, and 5.44%, respectively. Compared to the advanced point-voxel-based method PV-RCNN, our method improves the AP of "moderate" level by 1.23%, 2.84%, and 0.55% for the three classes, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.