4,199 results on '"Object Tracking"'
Search Results
2. Infants' saccadic behavior during 2-dimensional displays of a bounce
- Author
-
Šimkovic, Matúš and Träuble, Birgit
- Published
- 2025
- Full Text
- View/download PDF
3. Phase-shifting profilometry for 3D shape measurement of moving objects on production lines
- Author
-
He, Qing, Ning, Jiaxing, Liu, Xu, and Li, Qingying
- Published
- 2025
- Full Text
- View/download PDF
4. Partitioned token fusion and pruning strategy for transformer tracking
- Author
-
Zhang, Chi, Gao, Yun, Meng, Tao, and Wang, Tao
- Published
- 2025
- Full Text
- View/download PDF
5. A hybrid approach for vision-based structural displacement measurement using transforming model prediction and KLT
- Author
-
Nguyen, Xuan Tinh, Jeon, Geonyeol, Vy, Van, Lee, Geonhee, Lam, Phat Tai, and Yoon, Hyungchul
- Published
- 2025
- Full Text
- View/download PDF
6. SASU-Net: Hyperspectral video tracker based on spectral adaptive aggregation weighting and scale updating
- Author
-
Zhao, Dong, Zhang, Haorui, Huang, Kunpeng, Zhu, Xuguang, Arun, Pattathal V., Jiang, Wenhao, Li, Shiyu, Pei, Xiaofang, and Zhou, Huixin
- Published
- 2025
- Full Text
- View/download PDF
7. A new quantitative analysis method for the fish behavior under ammonia nitrogen stress based on pruning strategy
- Author
-
Xu, Wenkai, Yu, Jiaxuan, Xiao, Ying, Wang, Guangxu, Li, Xin, and Li, Daoliang
- Published
- 2025
- Full Text
- View/download PDF
8. Learning orientational interaction-aware attention and localization refinement for object tracking
- Author
-
Zhang, Yi, Wang, Wuwei, and Huang, Hanlin
- Published
- 2025
- Full Text
- View/download PDF
9. BSTrack: Robust UAV tracking using feature extraction of bilateral filters and sparse attention mechanism
- Author
-
Wang, Ke, Wang, Zhuhai, Zhang, Ximin, and Liu, Mengzhen
- Published
- 2025
- Full Text
- View/download PDF
10. REPS: Rotation equivariant Siamese network enhanced by probability segmentation for satellite video tracking
- Author
-
Chen, Yuzeng, Tang, Yuqi, Yuan, Qiangqiang, and Zhang, Liangpei
- Published
- 2024
- Full Text
- View/download PDF
11. TabCtNet: Target-aware bilateral CNN-transformer network for single object tracking in satellite videos
- Author
-
Zhu, Qiqi, Huang, Xin, and Guan, Qingfeng
- Published
- 2024
- Full Text
- View/download PDF
12. A fuzzy decision-making system for video tracking with multiple objects in non-stationary conditions
- Author
-
Fakhri, Payam Safaei, Asghari, Omid, Sarspy, Sliva, Marand, Mehran Borhani, Moshaver, Paria, and Trik, Mohammad
- Published
- 2023
- Full Text
- View/download PDF
13. Comparative analysis of single-view and multiple-view data collection strategies for detecting partially-occluded grape bunches: Field trials
- Author
-
Ariza-Sentís, Mar, Baja, Hilmy, Vélez, Sergio, van Essen, Rick, and Valente, João
- Published
- 2025
- Full Text
- View/download PDF
14. Robust Object Tracking with Continuous Data Association based on Artificial Potential Moving Horizon Estimation
- Author
-
Abe, Ryoya, Kikuchi, Tomoya, Nonaka, Kenichiro, and Sekiguchi, Kazuma
- Published
- 2020
- Full Text
- View/download PDF
15. Vision-based Object Tracking in Marine Environments using Features from Neural Network Detections
- Author
-
Schöller, F.E.T., Blanke, M., Plenge-Feidenhans’, M.K., and Nalpantidis, L.
- Published
- 2020
- Full Text
- View/download PDF
16. Enhancing video pedestrian detection with tracking-information-aided framework and multi-scale feature optimization.
- Author
-
Sang, Haifeng and Suo, Pengkai
- Abstract
To address high miss rates and low detection accuracy caused by object scale variations and occlusion in video pedestrian detection, this study proposes a General Tracking-Information-Aided Detection Framework (GTIADF). The framework integrates object detection and tracking, leveraging inter-frame temporal information to improve robustness against scale changes and occlusion. GTIADF consists of two main components: (1) an enhanced multi-object tracking algorithm, Deep SORT+, which incorporates advanced modules, camera motion compensation, and DEMA for improved tracking robustness; and (2) an aided detection module that reduces missed detections via validation, interpolation, and re-scoring. The framework is adaptable and can be seamlessly integrated with other detectors to enhance their performance. Based on GTIADF, we propose the D3F-YOLO algorithm, which enhances the feature extraction module by using deformable convolution for better detection at varying scales. Additionally, a focusing diffusion pyramid network is introduced to improve multiscale feature representation, and the loss function is optimized to boost accuracy. Experiments on the Caltech dataset show that the proposed method achieves a mean mean precision (mAP) of 65. 7% and a miss rate of 32.4%, confirming its effectiveness in challenging scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
17. Deep Feature-Based Hyperspectral Object Tracking: An Experimental Survey and Outlook.
- Author
-
Wang, Yuchao, Li, Xu, Yang, Xinyan, Ge, Fuyuan, Wei, Baoguo, Li, Lixin, and Yue, Shigang
- Subjects
- *
BAND gaps , *SPEED , *VIDEOS - Abstract
With the rapid advancement of hyperspectral imaging technology, hyperspectral object tracking (HOT) has become a research hotspot in the field of remote sensing. Advanced HOT methods have been continuously proposed and validated on scarce datasets in recent years, which can be roughly divided into handcrafted feature-based methods and deep feature-based methods. Compared with methods via handcrafted features, deep feature-based methods can extract highly discriminative semantic features from hyperspectral images (HSIs) and achieve excellent tracking performance, making them more favored by the hyperspectral tracking community. However, deep feature-based HOT still faces challenges such as data-hungry, band gap, low tracking efficiency, etc. Therefore, it is necessary to conduct a thorough review of current trackers and unresolved problems in the HOT field. In this survey, we systematically classify and conduct a comprehensive analysis of 13 state-of-the-art deep feature-based hyperspectral trackers. First, we classify and analyze the trackers based on the framework and tracking process. Second, the trackers are compared and analyzed in terms of tracking accuracy and speed on two datasets for cross-validation. Finally, we design a specialized experiment for small object tracking (SOT) to further validate the tracking performance. Through in-depth investigation, the advantages and weaknesses of current HOT technology based on deep features are clearly demonstrated, which also points out the directions for future development. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
18. Move Over Law Compliance Analysis Utilizing a Deep Learning Computer Vision Approach.
- Author
-
Sekuła, Przemysław, Shayesteh, Narjes, He, Qinglian, Zahedian, Sara, Moscoso, Rodrigo, and Cholewa, Michał
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,CONSCIOUSNESS raising ,TRAFFIC safety ,VIDEO recording - Abstract
This paper presents the results of the Move Over law compliance study. This study was carried out for The Federal Highway Administration in cooperation with ten State Highway agencies that provided the data (video recordings). This paper describes an outline of the system that was invented, developed, and applied to determine Move Over law compliance, as well as the initial analysis of the impact of various factors on compliance. In order to carry out the analysis, we processed 68 videos that contained over 33,000 vehicles. The median compliance with the Move Over law was 42.5% and varied heavily depending on diverse factors. This study makes two key contributions: first, it introduces an automated deep learning-based system that detects and evaluates Move Over law compliance by leveraging object detection and tracking technologies. Second, it presents a large-scale, multi-state compliance assessment, providing new empirical insights into driver behavior across various incident conditions. These findings offer a data-driven foundation for refining Move Over laws, enhancing public awareness efforts, and improving enforcement strategies. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
19. Particle Filter Tracking System Based on Digital Zoom and Regional Image Measure.
- Author
-
Zhao, Qisen, Dong, Liquan, Chu, Xuhong, Liu, Ming, Kong, Lingqin, and Zhao, Yuejin
- Subjects
- *
IMAGE processing , *DETECTORS , *ALGORITHMS , *VIDEOS - Abstract
To address the challenges of low accuracy and the difficulty in balancing a large field of view and long distance when tracking high-speed moving targets with a single sensor, an ROI adaptive digital zoom tracking method is proposed. In this paper, we discuss the impact of ROI on image processing and describe the design of the ROI adaptive digital zoom tracking system. Additionally, we construct an adaptive ROI update model based on normalized target information. To capture target changes effectively, we introduce the multi-scale regional measure and propose an improved particle filter algorithm, referred to as the improved multi-scale regional measure resampling particle filter (IMR-PF). This method enables high temporal resolution processing efficiency within a high-resolution large field of view, which is particularly beneficial for high-resolution videos. The IMR-PF can maintain high temporal resolution within a wide field of view with high resolution. Simulation results demonstrate that the improved target tracking method effectively improves tracking robustness to target motion changes and reduces the tracking center error by 20%, as compared to other state-of-the-art methods. The IMR-PF still maintains good performance even when confronted with various interference factors and in real-world scenario applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
20. Siamsdt: a self-adaptive dynamic template siamese network for airborne visual tracking of MAVs on heterogeneous FPGA-SoC: SiamSDT: a self-adaptive dynamic template siamese network...: Y. Zhang et al.
- Author
-
Zhang, Yuxin, Wen, Jiazheng, Wu, Ran, Liu, Huanyu, and Li, Junbao
- Abstract
Airborne visual tracking is pivotal in enhancing the autonomy and intelligence of micro aerial vehicles (MAVs). However, MAVs frequently encounter challenges such as viewpoint changes and interference from similar objects in practice. Additionally, due to their small size and lightweight characteristics, MAVs have limited onboard computational resources, significantly constraining algorithm complexity and impacting tracking performance. To address these issues, we propose a robust and lightweight tracking model, self-adaptive dynamic template Siamese network (SiamSDT). Leveraging two key designs: temporal attention mechanism and Self-adaptive Template Fusion module, SiamSDT is capable of adapting to the appearance variations during the tracking process. Specifically, temporal attention mechanism integrates historical information in a sequential manner, retaining pertinent information while reducing storage and computational complexity. Additionally, the Self-adaptive Template Fusion module dynamically adjusts the fusion ratio of each template through a similarity matrix, further enhancing the model’s adaptability and anti-interference capability. Furthermore, we propose a solution tailored for heterogeneous ZYNQ platforms to deal with the issue of limited onboard resources, and an FPGA-based accelerator is designed to accelerate the inference process through pipeline, data reuse, ping-pong operation and array partition. The performance of SiamSDT was evaluated on OTB and UAV123 dataset. On the UAV123 dataset, SiamSDT achieves a 4.8% increase in precision and a 1.2% increase in success rate compared to the baseline algorithm without any increase in parameters. The hardware simulation experiments demonstrate that our deployment scheme can significantly reduce inference latency with an acceptable decrease in tracking performance. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
21. A new multi-object tracking pipeline based on computer vision techniques for mussel farms.
- Author
-
Zeng, Dylon, Liu, Ivy, Bi, Ying, Vennell, Ross, Briscoe, Dana, Xue, Bing, and Zhang, Mengjie
- Subjects
- *
OBJECT tracking (Computer vision) , *OBJECT recognition (Computer vision) , *MUSSEL culture , *IMAGE processing , *IMAGE registration , *DEEP learning - Abstract
Mussel farming is a thriving industry in New Zealand and is crucial to local communities. Currently, farmers keep track of their mussel floats by taking regular boat trips to the farm. This is a labour-intensive assignment. Integrating computer vision techniques into aquafarms will significantly alleviate the pressure on mussel farmers. However, tracking a large number of identical targets under various image conditions raises a considerable challenge. This paper proposes a new computer vision-based pipeline to automatically detect and track mussel floats in images. The proposed pipeline consists of three steps, i.e. float detection, float description, and float matching. In the first step, a new detector based on several image processing operators is used to detect mussel floats of all sizes in the images. Then a new descriptor is employed to provide unique identity markers to mussel floats based on the relative positions of their neighbours. Finally, float matching across adjacent frames is done by image registration. Experimental results on the images taken in Marlborough Sounds New Zealand have shown that the proposed pipeline achieves an 82.9% MOTA – 18% higher than current deep learning-based approaches – without the need for training. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
22. Enhancing automatic license plate recognition in Indian scenarios.
- Author
-
Samaga, Abhinav, Lobo, Allen Joel, Nasreen, Azra, Pattar, Ramakanth Kumar, Trivedi, Neeta, Raj, Peehu, and Sreelakshmi, Koratagere
- Subjects
OPTICAL character recognition ,AUTOMOBILE license plates ,GRAPHICS processing units ,LAW enforcement agencies - Abstract
Automatic license plate recognition (ALPR) technology has gained widespread use in many countries, including India. With the explosion in the number of vehicles plying over the roads in the past few years, automating the process of documenting vehicle license plates for use by law enforcement agencies and traffic management authorities has great significance. There have been various advancements in the object detection, object tracking, and optical character recognition domain but integrated pipelines for ALPR in Indian scenarios are a rare occurrence. This paper proposes an architecture that can track vehicles across multiple frames, detect number plates and perform optical character recognition (OCR) on them. A dataset consisting of Indian vehicles for the detection of oblique license plates is collected and a framework to increase the accuracy of OCR using the data across multiple frames is proposed. The proposed system can record license plate readings of vehicles averaging 527.99 and 2157.09 ms per frame using graphics processing unit (GPU) and central processing unit (CPU) respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
23. A smart vista-lite system for anomaly detection and motion prediction for video surveillance in vibrant urban settings.
- Author
-
Alasiry, Areej and Qayyum, Mohammed
- Abstract
The Vista-Lite system labels major challenges in video surveillance involving computational complexity, restricted transferability over datasets, and the absence of an impactful approach to examine data from various cameras. This system emphasis three methodologies to solve these issues UOAL, TempoNet and BDSO. UOAL identifies abnormalities in video content via a segmentation approach improving accuracy in complex environments. TempoNet concentrates on forecasting motions and behaviors utilizing modern neural network frameworks, enhancing response times in identifying possibly malicious situations. BDSO enhances the computational resources by tuning system parameters thus assuring flexibility and decreasing false alarms. This fusion improves system persistence, sensibility and functional cost-efficiency making the solution versatile to vast surveillance scenarios. Comprehensive experiments using pedestrian, UCSD, and mall datasets established increased performance with 99% accuracy indicating the system’s capacity to maintain real-time, multi-camera data. Vista-Lite provides a novel, innovative, flexible approach to video surveillance combining anomaly detection, motion prediction, and resource optimization for improving and enhancing the domain. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
24. Vehicle Flow Detection and Tracking Based on an Improved YOLOv8n and ByteTrack Framework.
- Author
-
Liu, Jinjiang, Xie, Yonghua, Zhang, Yu, and Li, Haoming
- Subjects
TRAFFIC flow ,INTELLIGENT transportation systems ,INFORMATION processing ,ALGORITHMS - Abstract
Vehicle flow detection and tracking are crucial components of intelligent transportation systems. However, traditional methods often struggle with challenges such as the poor detection of small objects and low efficiency when processing large-scale data. To address these issues, this paper proposes a vehicle flow detection and tracking method that integrates an improved YOLOv8n model with the ByteTrack algorithm. In the detection module, we introduce the innovative MSN-YOLO model, which combines the C2f_MLCA module, the Detect_SEAM module, and the NWD loss function to enhance feature fusion and improve cross-scale information processing. These enhancements significantly boost the model's ability to detect small objects and handle complex backgrounds. In the tracking module, we incorporate the ByteTrack algorithm and train unique vehicle re-identification (Re-ID) features, ensuring robust multi-object tracking in complex environments and improving the stability and accuracy of vehicle flow tracking. The experimental results demonstrate that the proposed method achieves a mean Average Precision (mAP) of 62.8% at IoU = 0.50 and a Multiple Object Tracking Accuracy (MOTA) of 72.16% in real-time tracking. These improvements represent increases of 2.7% and 3.16%, respectively, compared to baseline algorithms. This method provides effective technical support for intelligent traffic management, traffic flow monitoring, and congestion prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
25. LittleFaceNet: A Small-Sized Face Recognition Method Based on RetinaFace and AdaFace.
- Author
-
Ren, Zhengwei, Liu, Xinyu, Xu, Jing, Zhang, Yongsheng, and Fang, Ming
- Subjects
HUMAN facial recognition software ,DETECTION algorithms ,VIDEO surveillance ,DEEP learning ,LABORATORY management ,TRACKING algorithms - Abstract
For surveillance video management in university laboratories, issues such as occlusion and low-resolution face capture often arise. Traditional face recognition algorithms are typically static and rely heavily on clear images, resulting in inaccurate recognition for low-resolution, small-sized faces. To address the challenges of occlusion and low-resolution person identification, this paper proposes a new face recognition framework by reconstructing Retinaface-Resnet and combining it with Quality-Adaptive Margin (adaface). Currently, although there are many target detection algorithms, they all require a large amount of data for training. However, datasets for low-resolution face detection are scarce, leading to poor detection performance of the models. This paper aims to solve Retinaface's weak face recognition capability in low-resolution scenarios and its potential inaccuracies in face bounding box localization when faces are at extreme angles or partially occluded. To this end, Spatial Depth-wise Separable Convolutions are introduced. Retinaface-Resnet is designed for face detection and localization, while adaface is employed to address low-resolution face recognition by using feature norm approximation to estimate image quality and applying an adaptive margin function. Additionally, a multi-object tracking algorithm is used to solve the problem of moving occlusion. Experimental results demonstrate significant improvements, achieving an accuracy of 96.12% on the WiderFace dataset and a recognition accuracy of 84.36% in practical laboratory applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
26. Innovative Road Object Detection and Distance Estimation Framework Using Monocular Cameras for Advanced Driver Assistance Systems.
- Author
-
Bouazizi, Omar, Azroumahli, Chaimae, and El Mourabit, Aimad
- Subjects
OBJECT recognition (Computer vision) ,LONG short-term memory ,DRIVER assistance systems ,CAMERA calibration ,ROAD safety measures - Abstract
Advanced Driver Assistance Systems (ADAS) rely on accurate road object detection, tracking, and distance estimation to enhance road safety. This paper presents ROD-YOLOv8, an innovative framework that leverages a monocular camera for these tasks within ADAS. By integrating the YOLOv8 model with transfer learning, the framework achieves high accuracy in road object detection. The model was retrained on the BDD100k dataset and further validated using COCO, KITTI, and PASCAL VOC datasets for comparison with existing models, achieving a mean average precision (mAP) of 0.782 and an F1 score of 0.874 on the COCO dataset. Object tracking is maintained through the Bot-Sort algorithm, ensuring consistent tracking of detected objects and providing continuous monitoring and future location prediction. Advanced camera calibration techniques enable accurate distance estimation between the camera and the detected objects, resulting in a mean absolute distance error of just 2.4 meters. Operating at an impressive 83 frames per second (FPS), ROD-YOLOv8 showcases its real-time capabilities, contributing to safer autonomous and assisted driving experiences. The proposed object detection model demonstrates improved performance compared to existing methods, particularly outperforming other methods like Mobile NET and earlier versions of YOLO in terms of precision, recall, and speed, measured in Frames Per Second (FPS), making it highly suitable for integration into ADAS applications. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
27. A Dual-Stage Processing Architecture for Unmanned Aerial Vehicle Object Detection and Tracking Using Lightweight Onboard and Ground Server Computations.
- Author
-
Ntousis, Odysseas, Makris, Evangelos, Tsanakas, Panayiotis, and Pavlatos, Christos
- Subjects
OBJECT recognition (Computer vision) ,ARTIFICIAL neural networks ,DRONE aircraft ,CLIENT/SERVER computing ,RASPBERRY Pi - Abstract
UAVs are widely used for multiple tasks, which in many cases require autonomous processing and decision making. This autonomous function often requires significant computational capabilities that cannot be integrated into the UAV due to weight or cost limitations, making the distribution of the workload and the combination of the results produced necessary. In this paper, a dual-stage processing architecture for object detection and tracking in Unmanned Aerial Vehicles (UAVs) is presented, focusing on efficient resource utilization and real-time performance. The proposed system delegates lightweight detection tasks to onboard hardware while offloading computationally intensive processes to a ground server. The UAV is equipped with a Raspberry Pi for onboard data processing, utilizing an Intel Neural Compute Stick 2 (NCS2) for accelerated object detection. Specifically, YOLOv5n is selected as the onboard model. The UAV transmits selected frames to the ground server, which handles advanced tracking, trajectory prediction, and target repositioning using state-of-the-art deep learning models. Communication between the UAV and the server is maintained through a high-speed Wi-Fi link, with a fallback to a 4G connection when needed. The ground server, equipped with an NVIDIA A40 GPU, employs YOLOv8x for object detection and DeepSORT for multi-object tracking. The proposed architecture ensures real-time tracking with minimal latency, making it suitable for mission-critical UAV applications such as surveillance and search and rescue. The results demonstrate the system's robustness in various environments, highlighting its potential for effective object tracking under limited onboard computational resources. The system achieves recall and accuracy scores as high as 0.53 and 0.74, respectively, using the remote server, and is capable of re-identifying a significant portion of objects of interest lost by the onboard system, measured at approximately 70%. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
28. ADSTrack: adaptive dynamic sampling for visual tracking.
- Author
-
Wang, Zhenhai, Yuan, Lutao, Ren, Ying, Zhang, Sen, and Tian, Hongyu
- Abstract
The most common method for visual object tracking involves feeding an image pair comprising a template image and search region into a tracker. The tracker uses a backbone to process the information in the image pair. In pure Transformer-based frameworks, redundant information in image pairs exists throughout the tracking process and the corresponding negative tokens consume the same computational resources as the positive tokens while degrading the performance of the tracker. Therefore, we propose to solve this problem using an adaptive dynamic sampling strategy in a pure Transformer-based tracker, known as ADSTrack. ADSTrack progressively reduces irrelevant, redundant negative tokens in the search region that are not related to the tracked objectand the effect of noise generated by these tokens. The adaptive dynamic sampling strategy enhances the performance of the tracker by scoring and adaptive sampling of important tokens, and the number of tokens sampled varies according to the input image. Moreover, the adaptive dynamic sampling strategy is a parameterless token sampling strategy that does not use additional parameters. We add several extra tokens as auxiliary tokens to the backbone to further optimize the feature map. We extensively evaluate ADSTrack, achieving satisfactory results for seven test sets, including UAV123 and LaSOT. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
29. Advancing Underwater Vision: A Survey of Deep Learning Models for Underwater Object Recognition and Tracking
- Author
-
Mahmoud Elmezain, Lyes Saad Saoud, Atif Sultan, Mohamed Heshmat, Lakmal Seneviratne, and Irfan Hussain
- Subjects
Underwater computer vision ,deep learning ,underwater robotics ,ocean research ,underwater image enhancement ,object tracking ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Underwater computer vision plays a vital role in ocean research, enabling autonomous navigation, infrastructure inspections, and marine life monitoring. However, the underwater environment presents unique challenges, including color distortion, limited visibility, and dynamic light conditions, which hinder the performance of traditional image processing methods. Recent advancements in deep learning (DL) have demonstrated remarkable success in overcoming these challenges by enabling robust feature extraction, image enhancement, and object recognition. This review provides a comprehensive analysis of cutting-edge deep learning architectures designed for underwater object detection, segmentation, and tracking. State-of-the-art (SOTA) models, including AGW-YOLOv8, Feature-Adaptive FPN, and Dual-SAM, have shown substantial improvements in addressing occlusions, camouflaging, and small underwater object detection. For tracking tasks, transformer-based models like SiamFCA and FishTrack leverage hierarchical attention mechanisms and convolutional neural networks (CNNs) to achieve high accuracy and robustness in dynamic underwater environments. Beyond optical imaging, this review explores alternative modalities such as sonar, hyperspectral imaging, and event-based vision, which provide complementary data to enhance underwater vision systems. These approaches improve performance under challenging conditions, enabling richer and more informative scene interpretation. Promising future directions are also discussed, emphasizing the need for domain adaptation techniques to improve generalizability, lightweight architectures for real-time performance, and multi-modal data fusion to enhance interpretability and robustness. By critically evaluating current methodologies and highlighting gaps, this review provides insights for advancing underwater computer vision systems to support ocean exploration, ecological conservation, and disaster management.
- Published
- 2025
- Full Text
- View/download PDF
30. Vehicle Turn Pattern Counting and Short Term Forecasting Using Deep Learning for Urban Traffic Management System
- Author
-
Sundarakrishnan Narayanan, Sohan Varier, Tarun Bhupathi, Manaswini Simhadri Kavali, Mohana, P. Ramakanth Kumar, and K. Sreelakshmi
- Subjects
Auto-ARIMA ,deep learning ,object tracking ,time-series analysis ,traffic forecasting ,YOLOv8 ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Urban traffic management has been facing increasing challenges due to the surge in number of vehicles and traffic congestion. As cities expand and population grows, efficient and accurate monitoring of vehicle counts is crucial for better traffic management. Hence, as part of the Bengaluru Mobility Challenge 2024, organized by Bengaluru Traffic Police in collaboration with The Indian Institute of Science, we propose a solution to address the issue by developing a predictive model to estimate vehicle counts by turning pattern from traffic video footage. The dataset consists of traffic video footage of 23 different junctions around Bengaluru, on which 7 vehicle classes had to be detected, namely, Car, Truck, Bus, Two-Wheeler, Three-Wheeler, Light Commercial Vehicle and Bicycle. The proposed work focuses on two key objectives: counting vehicle turns over 30-minute clips and forecasting future vehicle turn counts by class for the next 30 minutes. A You Only Look Once (YOLOv8) and Auto-ARIMA based pipeline was deployed to address the challenge, which demonstrated robust detection capabilities, with an overall precision of 92.59% for vehicle detection. Building on this, we designed a custom vehicle counting algorithm that integrated the BoT-SORT tracker with dynamic counting boxes, accurately capturing vehicle movements and turn patterns in real-time and this integrated approach attained a best deviation of 20.79%. for turn pattern counting and 28.41% for forecasting. Furthermore, the system is scalable to accommodate any number of cameras and is capable of forecasting traffic over extended time frames, allowing it to be applied to a variety of urban traffic monitoring scenarios. These results highlight the effectiveness of our custom designed framework in real-world scenarios as a reliable model for applications needing high-precision detection and predictive analytics.
- Published
- 2025
- Full Text
- View/download PDF
31. Global-regional-local multilevel lightweight attention modeling for event-based efficient video reconstruction.
- Author
-
Nie, Ziyu, Li, Yuhui, Teng, Dongdong, and Liu, Lilin
- Abstract
High-speed motion and complex lighting conditions pose great challenges to vision tasks based on traditional frame cameras. Event cameras emerge out to address them but encounter difficulties in applying computer vision algorithms on the event streams—a novel imaging data paradigm. Existing state-of-the-art methods overly emphasize improving the quality of reconstructed videos, while neglecting the subsequent application issues of model deployment and output videos. Due to the large number of parameters, they are too inefficient for edge embedded devices. Besides, those manual design methods for batch preprocessing data are clumsy and lack generalization ability for different scenarios. This paper focuses on event-based efficient video reconstruction for further high-level vision tasks. A novel global-regional-local multilevel lightweight attention hierarchical architecture is proposed, termed Global-Regional-Local-E2VID (GRL-E2VID). This frame-work leverages chunked incoherent radiation attention and membrane potential tensor transformation to establish global, regional, and local dependencies on asynchronous and sparse events, thereby enhancing the quality and efficiency of video reconstruction. Experimental results demonstrate that, with the reduction of 60% in parameter quantity, our approach maintains a video reconstruction quality comparable to the state-of-the-art method. Moreover, it aids in object classification, raising small object recognition confidence by 60% and strengthening model stability in complex scenarios. For object tracking, the multimodal algorithm based on GRL-E2VID’s reconstructed video other than RGB video doubles tracking efficiency. Besides, its ability of generating high-frame-rate clear video under challenging environments such as complex illumination and high-speed motion promises its value in high-level visual tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
32. ADSTrack: adaptive dynamic sampling for visual tracking
- Author
-
Zhenhai Wang, Lutao Yuan, Ying Ren, Sen Zhang, and Hongyu Tian
- Subjects
Object tracking ,Adaptive transformer ,Dynamic token ,Auxiliary token ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract The most common method for visual object tracking involves feeding an image pair comprising a template image and search region into a tracker. The tracker uses a backbone to process the information in the image pair. In pure Transformer-based frameworks, redundant information in image pairs exists throughout the tracking process and the corresponding negative tokens consume the same computational resources as the positive tokens while degrading the performance of the tracker. Therefore, we propose to solve this problem using an adaptive dynamic sampling strategy in a pure Transformer-based tracker, known as ADSTrack. ADSTrack progressively reduces irrelevant, redundant negative tokens in the search region that are not related to the tracked objectand the effect of noise generated by these tokens. The adaptive dynamic sampling strategy enhances the performance of the tracker by scoring and adaptive sampling of important tokens, and the number of tokens sampled varies according to the input image. Moreover, the adaptive dynamic sampling strategy is a parameterless token sampling strategy that does not use additional parameters. We add several extra tokens as auxiliary tokens to the backbone to further optimize the feature map. We extensively evaluate ADSTrack, achieving satisfactory results for seven test sets, including UAV123 and LaSOT.
- Published
- 2024
- Full Text
- View/download PDF
33. An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network
- Author
-
Ming Him Lui, Haixu Liu, Zhuochen Tang, Hang Yuan, David Williams, Dongjin Lee, K. C. Wong, and Zihao Wang
- Subjects
object detection ,object tracking ,data augmentation ,Stable Diffusion ,pan–tilt–zoom ,camera calibration ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target differentiation and ease of deployment. Based on the principle of knowledge distillation, a novel data augmentation method is proposed to coordinate the latest open-source pre-trained large models in semantic segmentation, text generation, and image generation tasks to train a BicycleGAN for image enhancement. The resulting dataset is tested on various model structures and backbone sizes of two mainstream object detection frameworks, Ultralytics’ YOLO and MMDetection. Additionally, the algorithm implements and compares two popular object trackers, Bot-SORT and ByteTrack. The experimental proof-of-concept deploys the YOLOv8n model, which achieves an average precision of 82.2% and an inference time of 0.6 ms. Alternatively, the YOLO11x model maximises average precision at 86.7% while maintaining an inference time of 9.3 ms without bottlenecking subsequent processes. Stereo vision achieves accuracy within a median error of 90 mm following a drone flying over 1 m/s in an 8 m × 4 m area of interest. Stable single-object tracking with the PTZ camera is successful at 15 fps with an accuracy of 92.58%.
- Published
- 2024
- Full Text
- View/download PDF
34. Intraoperative patient‐specific volumetric reconstruction and 3D visualization for laparoscopic liver surgery
- Author
-
Luca Boretto, Egidijus Pelanis, Alois Regensburger, Kaloian Petkov, Rafael Palomar, Åsmund Avdem Fretland, Bjørn Edwin, and Ole Jakob Elle
- Subjects
augmented reality ,biomedical imaging ,computer vision ,liver ,medical image processing ,object tracking ,Medical technology ,R855-855.5 - Abstract
Abstract Despite the benefits of minimally invasive surgery, interventions such as laparoscopic liver surgery present unique challenges, like the significant anatomical differences between preoperative images and intraoperative scenes due to pneumoperitoneum, patient pose, and organ manipulation by surgical instruments. To address these challenges, a method for intraoperative three‐dimensional reconstruction of the surgical scene, including vessels and tumors, without altering the surgical workflow, is proposed. The technique combines neural radiance field reconstructions from tracked laparoscopic videos with ultrasound three‐dimensional compounding. The accuracy of our reconstructions on a clinical laparoscopic liver ablation dataset, consisting of laparoscope and patient reference posed from optical tracking, laparoscopic and ultrasound videos, as well as preoperative and intraoperative computed tomographies, is evaluated. The authors propose a solution to compensate for liver deformations due to pressure applied during ultrasound acquisitions, improving the overall accuracy of the three‐dimensional reconstructions compared to the ground truth intraoperative computed tomography with pneumoperitoneum. A unified neural radiance field from the ultrasound and laparoscope data, which allows real‐time view synthesis providing surgeons with comprehensive intraoperative visual information for laparoscopic liver surgery, is trained.
- Published
- 2024
- Full Text
- View/download PDF
35. Efficient class‐agnostic obstacle detection for UAV‐assisted waterway inspection systems
- Author
-
Pablo Alonso, Jon Ander Íñiguez de Gordoa, Juan Diego Ortega, and Marcos Nieto
- Subjects
computer vision ,object detection ,object tracking ,real‐time systems ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded‐friendly waterway obstacle detection pipeline that runs on a camera‐equipped drone. This system uses a class‐agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1.
- Published
- 2024
- Full Text
- View/download PDF
36. Efficient transformer tracking with adaptive attention
- Author
-
Dingkun Xiao, Zhenzhong Wei, and Guangjun Zhang
- Subjects
computer vision ,convolution ,convolutional neural nets ,object tracking ,target tracking ,tracking ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Recently, several trackers utilising Transformer architecture have shown significant performance improvement. However, the high computational cost of multi‐head attention, a core component in the Transformer, has limited real‐time running speed, which is crucial for tracking tasks. Additionally, the global mechanism of multi‐head attention makes it susceptible to distractors with similar semantic information to the target. To address these issues, the authors propose a novel adaptive attention that enhances features through the spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi‐head attention. Our adaptive attention sets a perception range around each element in the feature map based on the target scale in the previous tracking result and adaptively searches for the information of interest. This allows the module to focus on the target region rather than background distractors. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi‐level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that the authors’ tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances.
- Published
- 2024
- Full Text
- View/download PDF
37. Fast moving table tennis ball tracking algorithm based on graph neural network
- Author
-
Tianjian Zou, Jiangning Wei, Bo Yu, Xinzhu Qiu, Hao Zhang, Xu Du, and Jun Liu
- Subjects
Table tennis ,Fast moving object ,Object tracking ,Object detection ,Graph neural network ,Sports analytics ,Medicine ,Science - Abstract
Abstract The key object tracking in sports video scenarios poses a pivotal challenge in the analysis of sports techniques and tactics. In table tennis, due to the small size and rapid motion of the ball, identifying and tracking the table tennis ball through video is a particularly arduous task, where the majority of existing detection and tracking algorithms struggle to meet the practical application requirements in real-world scenarios. To address this issue, this paper proposes a combined technical approach integrating detection and discrimination, tailored to the unique motion characteristics of table tennis. For the detector, we utilize and refine a common video differential detector. As for the discriminator, we introduce GMP (a Graph Max-message Pass Neural Network), which is designed specifically for tracking table tennis balls or similar objects. Furthermore, we enhance an existing dataset for table tennis tracking problems by enriching its scenarios. The results demonstrate that our proposed technical solution performs impressively on both the dataset and the intended real-world environments, showcasing the good scalability of our algorithms and models as well as their potential for application in other scenarios.
- Published
- 2024
- Full Text
- View/download PDF
38. A Novel Approach for Privacy Preserving Object Re-Identification on Edge Devices
- Author
-
Robert Kathrein, Oliver Zeilerbauer, Johannes Georg Larcher, and Mario Döller
- Subjects
data privacy ,object location ,spatial encoding ,object tracking ,object re-identification ,Telecommunication ,TK5101-6720 - Abstract
Computer vision approaches have been widely used in mobility tasks such as visitor counting, traffic analyisis, etc. The European General Data Protection Regulation (GDPR) enforces in-camera processing as storing and transmitting such data violates this regulation. This paper introduces a novel approach for object Re-Identification (Re-ID) on edge devices using a color based encoded virtual plane for location mapping. The method leverages the spatial coding capabilities of the RGB color space to simplify the localisation process. By assigning unique RGB values to spatial coordinates, creating a multidimensional reference image that facilitates instant and accurate object localisation. This reduces computational complexity and allows global referencing across multiple cameras. We present an algorithmic framework for location mapping and demonstrating its capability through experimental validation. The techniques potential is further explored in applications such as object Re-ID, marking a significant advancement in computer vision and expanding the branch of spatial encoding methodologies. This approach represents a shift towards more privacy-oriented multi camera object tracking and Re-ID solutions.
- Published
- 2024
- Full Text
- View/download PDF
39. Lightweight Siamese Network with Global Correlation for Single-Object Tracking.
- Author
-
Ding, Yuxuan and Miao, Kehua
- Subjects
- *
TRANSFORMER models , *FEATURE extraction , *RESEARCH personnel , *TRACK & field , *COST - Abstract
Recent advancements in the field of object tracking have been notably influenced by Siamese-based trackers, which have demonstrated considerable progress in their performance and application. Researchers frequently emphasize the precision of trackers, yet they tend to neglect the associated complexity. This oversight can restrict real-time performance, rendering these trackers inadequate for specific applications. This study presents a novel lightweight Siamese network tracker, termed SiamGCN, which incorporates global feature fusion alongside a lightweight network architecture to improve tracking performance on devices with limited resources. MobileNet-V3 was chosen as the backbone network for feature extraction, with modifications made to the stride of its final layer to enhance extraction efficiency. A global correlation module, which was founded on the Transformer architecture, was developed utilizing a multi-head cross-attention mechanism. This design enhances the integration of template and search region features, thereby facilitating more precise and resilient tracking capabilities. The model underwent evaluation across four prominent tracking benchmarks: VOT2018, VOT2019, LaSOT, and TrackingNet. The results indicate that SiamGCN achieves high tracking performance while simultaneously decreasing the number of parameters and computational costs. This results in significant benefits regarding processing speed and resource utilization. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. DSiam-CnK: A CBAM- and KCF-Enabled Deep Siamese Region Proposal Network for Human Tracking in Dynamic and Occluded Scenes.
- Author
-
Liu, Xiangpeng, Han, Jianjiao, Peng, Yulin, Liang, Qiao, An, Kang, He, Fengqin, and Cheng, Yuhua
- Subjects
- *
ARTIFICIAL neural networks , *TRACK & field , *ALGORITHMS , *VELOCITY - Abstract
Despite the accuracy and robustness attained in the field of object tracking, algorithms based on Siamese neural networks often over-rely on information from the initial frame, neglecting necessary updates to the template; furthermore, in prolonged tracking situations, such methodologies encounter challenges in efficiently addressing issues such as complete occlusion or instances where the target exits the frame. To tackle these issues, this study enhances the SiamRPN algorithm by integrating the convolutional block attention module (CBAM), which enhances spatial channel attention. Additionally, it integrates the kernelized correlation filters (KCFs) for enhanced feature template representation. Building on this, we present DSiam-CnK, a Siamese neural network with dynamic template updating capabilities, facilitating adaptive adjustments in tracking strategy. The proposed algorithm is tailored to elevate the Siamese neural network's accuracy and robustness for prolonged tracking, all the while preserving its tracking velocity. In our research, we assessed the performance on the OTB2015, VOT2018, and LaSOT datasets. Our method, when benchmarked against established trackers, including SiamRPN on OTB2015, achieved a success rate of 92.1% and a precision rate of 90.9%. On the VOT2018 dataset, it excelled, with a VOT-A (accuracy) of 46.7%, a VOT-R (robustness) of 135.3%, and a VOT-EAO (expected average overlap) of 26.4%, leading in all categories. On the LaSOT dataset, it achieved a precision of 35.3%, a normalized precision of 34.4%, and a success rate of 39%. The findings demonstrate enhanced precision in tracking performance and a notable increase in robustness with our method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Confidence-Guided Frame Skipping to Enhance Object Tracking Speed.
- Author
-
Lee, Yun Gu
- Subjects
- *
COMPUTER vision , *SPEED , *ALGORITHMS , *CONFIDENCE , *OBJECT tracking (Computer vision) - Abstract
Object tracking is a challenging task in computer vision. While simple tracking methods offer fast speeds, they often fail to track targets. To address this issue, traditional methods typically rely on complex algorithms. This study presents a novel approach to enhance object tracking speed via confidence-guided frame skipping. The proposed method is strategically designed to complement existing methods. Initially, lightweight tracking is used to track a target. Only in scenarios where it fails to track is an existing, robust but complex algorithm used. The contribution of this study lies in the proposed confidence assessment of the lightweight tracking's results. The proposed method determines the need for intervention by the robust algorithm based on the predicted confidence level. This two-tiered approach significantly enhances tracking speed by leveraging the lightweight method for straightforward situations and the robust algorithm for challenging scenarios. Experimental results demonstrate the effectiveness of the proposed approach in enhancing tracking speed. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. Hybrid Online Visual Tracking of Non-rigid Objects.
- Author
-
Bagherzadeh, Mohammad Amin, Seyedarabi, Hadi, and Razavi, Seyed Naser
- Subjects
- *
COMPUTER vision , *VISUAL learning , *ONLINE education , *JOB performance , *DETECTORS - Abstract
Visual object tracking has been a fundamental topic of machine vision in recent years. Most trackers can hardly top the performance and work in real time. This paper presents a tracking framework based on the SiamFC network, which can be taught online from the beginning of tracking and is real time. SiamFC network has a high tracking speed but cannot be trained online. This limitation made it unable to track the target for a long time. Hybrid-Siam can be trained online to distinguish target and background by switching traditional tracking and deep learning methods. Using the traditional tracking method and a target detector based on saliency detection has led to long-term tracking. Our method runs at more than 60 frame per second during test time and achieves state-of-the-art performance on tracking benchmarks, while robust results for long-term tracking. Hybrid-Siam improves SiamFC and achieves AUC score 81.7% on LaSOT, 72.3% on OTB100, and average overlap of 66.2% on GOT-10 k. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. SiamRhic: Improved Cross-Correlation and Ranking Head-Based Siamese Network for Object Tracking in Remote Sensing Videos.
- Author
-
Yang, Afeng, Yang, Zhuolin, and Feng, Wenqing
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *COMPUTER vision , *REMOTE sensing , *TRACKING algorithms , *OBJECT tracking (Computer vision) - Abstract
Object tracking in remote sensing videos is a challenging task in computer vision. Recent advances in deep learning have sparked significant interest in tracking algorithms based on Siamese neural networks. However, many existing algorithms fail to deliver satisfactory performance in complex scenarios due to challenging conditions and limited computational resources. Thus, enhancing tracking efficiency and improving algorithm responsiveness in complex scenarios are crucial. To address tracking drift caused by similar objects and background interference in remote sensing image tracking, we propose an enhanced Siamese network based on the SiamRhic architecture, incorporating a cross-correlation and ranking head for improved object tracking. We first use convolutional neural networks for feature extraction and integrate the CBAM (Convolutional Block Attention Module) to enhance the tracker's representational capacity, allowing it to focus more effectively on the objects. Additionally, we replace the original depth-wise cross-correlation operation with asymmetric convolution, enhancing both speed and performance. We also introduce a ranking loss to reduce the classification confidence of interference objects, addressing the mismatch between classification and regression. We validate the proposed algorithm through experiments on the OTB100, UAV123, and OOTB remote sensing datasets. Specifically, SiamRhic achieves success, normalized precision, and precision rates of 0.533, 0.786, and 0.812, respectively, on the OOTB benchmark. The OTB100 benchmark achieves a success rate of 0.670 and a precision rate of 0.892. Similarly, in the UAV123 benchmark, SiamRhic achieves a success rate of 0.621 and a precision rate of 0.823. These results demonstrate the algorithm's high precision and success rates, highlighting its practical value. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. Jointly modeling association and motion cues for robust infrared UAV tracking.
- Author
-
Xu, Boyue, Hou, Ruichao, Bei, Jia, Ren, Tongwei, and Wu, Gangshan
- Subjects
- *
DATA augmentation , *COMPUTER vision , *DEEP learning , *ALGORITHMS , *OBJECT tracking (Computer vision) - Abstract
UAV tracking plays a crucial role in computer vision by enabling real-time monitoring UAVs, enhancing safety and operational capabilities while expanding the potential applications of drone technology. Off-the-shelf deep learning based trackers have not been able to effectively address challenges such as occlusion, complex motion, and background clutter for UAV objects in infrared modality. To overcome these limitations, we propose a novel tracker for UAV object tracking, named MAMC. To be specific, the proposed method first employs a data augmentation strategy to enhance the training dataset. We then introduce a candidate target association matching method to deal with the problem of interference caused by the presence of a large number of similar targets in the infrared pattern. Next, it leverages a motion estimation algorithm with window jitter compensation to address the tracking instability due to background clutter and occlusion. In addition, a simple yet effective object research and update strategy is used to address the complex motion and localization problem of UAV objects. Experimental results demonstrate that the proposed tracker achieves state-of-the-art performance on the Anti-UAV and LSOTB-TIR dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Improved multi object tracking with locality sensitive hashing.
- Author
-
Chemmanam, Ajai John, Jose, Bijoy, and Moopan, Asif
- Abstract
Object tracking is one of the most advanced applications of computer vision algorithms. While various tracking approaches have been previously developed, they often use many approximations and assumptions to enable real-time performance within the resource constraints in terms of memory, time and computational requirements. In order to address these limitations, we investigate the bottlenecks of existing tracking frameworks and propose a solution to enhance tracking efficiency. The proposed method uses Locality Sensitive Hashing (LSH) to efficiently store and retrieve nearest neighbours and then utilizes a bipartite cost matching based on the predicted positions, size, aspect ratio, appearance description, and uncertainty in motion estimation. The LSH algorithm helps reduce the dimensionality of the data while preserving their relative distances. LSH hashes the features in constant time and facilitates rapid nearest neighbour retrieval by considering features falling into the same hash buckets as similar. The effectiveness of the method was evaluated on the MOT benchmark dataset and achieved Multiple Object Tracker Accuracy (MOTA) of 67.1% (train) and 62.7% (test). Furthermore, our framework exhibits the highest Multiple Object Tracker Precision (MOTP), mostly tracked objects, and the lowest values for mostly lost objects and identity switches among the state-of-the-art trackers. The incorporation of LSH implementation reduced identity switches by approximately 7% and fragmentation by around 13%. We used the framework for real-time tracking applications on edge devices for an industry partner. We found that the LSH integration resulted in a notable reduction in track ID switching, with only a marginal increase in computation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Efficient class‐agnostic obstacle detection for UAV‐assisted waterway inspection systems.
- Author
-
Alonso, Pablo, Íñiguez de Gordoa, Jon Ander, Ortega, Juan Diego, and Nieto, Marcos
- Subjects
OBJECT recognition (Computer vision) ,COMPUTER vision ,ROOT-mean-squares ,WATERWAYS ,AQUATIC sports safety measures ,RUNWAYS (Aeronautics) - Abstract
Ensuring the safety of water airport runways is essential for the correct operation of seaplane flights. Among other tasks, airport operators must identify and remove various objects that may have drifted into the runway area. In this paper, the authors propose a complete and embedded‐friendly waterway obstacle detection pipeline that runs on a camera‐equipped drone. This system uses a class‐agnostic version of the YOLOv7 detector, which is capable of detecting objects regardless of its class. Additionally, through the usage of the GPS data of the drone and camera parameters, the location of the objects are pinpointed with 0.58 m Distance Root Mean Square. In our own annotated dataset, the system is capable of generating alerts for detected objects with a recall of 0.833 and a precision of 1. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Efficient transformer tracking with adaptive attention.
- Author
-
Xiao, Dingkun, Wei, Zhenzhong, and Zhang, Guangjun
- Subjects
TRANSFORMER models ,COMPUTER vision ,RUNNING speed ,COMPUTATIONAL complexity ,ATTENTION - Abstract
Recently, several trackers utilising Transformer architecture have shown significant performance improvement. However, the high computational cost of multi‐head attention, a core component in the Transformer, has limited real‐time running speed, which is crucial for tracking tasks. Additionally, the global mechanism of multi‐head attention makes it susceptible to distractors with similar semantic information to the target. To address these issues, the authors propose a novel adaptive attention that enhances features through the spatial sparse attention mechanism with less than 1/4 of the computational complexity of multi‐head attention. Our adaptive attention sets a perception range around each element in the feature map based on the target scale in the previous tracking result and adaptively searches for the information of interest. This allows the module to focus on the target region rather than background distractors. Based on adaptive attention, the authors build an efficient transformer tracking framework. It can perform deep interaction between search and template features to activate target information and aggregate multi‐level interaction features to enhance the representation ability. The evaluation results on seven benchmarks show that the authors' tracker achieves outstanding performance with a speed of 43 fps and significant advantages in hard circumstances. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. An Adaptive YOLO11 Framework for the Localisation, Tracking, and Imaging of Small Aerial Targets Using a Pan–Tilt–Zoom Camera Network.
- Author
-
Lui, Ming Him, Liu, Haixu, Tang, Zhuochen, Yuan, Hang, Williams, David, Lee, Dongjin, Wong, K. C., and Wang, Zihao
- Subjects
STABLE Diffusion ,CAMERA calibration ,DATA augmentation ,IMAGE intensifiers ,CAMERAS - Abstract
This article presents a cost-effective camera network system that employs neural network-based object detection and stereo vision to assist a pan–tilt–zoom camera in imaging fast, erratically moving small aerial targets. Compared to traditional radar systems, this approach offers advantages in supporting real-time target differentiation and ease of deployment. Based on the principle of knowledge distillation, a novel data augmentation method is proposed to coordinate the latest open-source pre-trained large models in semantic segmentation, text generation, and image generation tasks to train a BicycleGAN for image enhancement. The resulting dataset is tested on various model structures and backbone sizes of two mainstream object detection frameworks, Ultralytics' YOLO and MMDetection. Additionally, the algorithm implements and compares two popular object trackers, Bot-SORT and ByteTrack. The experimental proof-of-concept deploys the YOLOv8n model, which achieves an average precision of 82.2% and an inference time of 0.6 ms. Alternatively, the YOLO11x model maximises average precision at 86.7% while maintaining an inference time of 9.3 ms without bottlenecking subsequent processes. Stereo vision achieves accuracy within a median error of 90 mm following a drone flying over 1 m/s in an 8 m × 4 m area of interest. Stable single-object tracking with the PTZ camera is successful at 15 fps with an accuracy of 92.58%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Enhancing visual monitoring via multi-feature fusion and template update strategies.
- Author
-
Rafique, Fahad, Zheng, Liying, Benarab, Acheraf, and Javed, Muhammad Hafeez
- Abstract
Recent advancements in computer vision, particularly deep learning, have significantly influenced visual monitoring across varied scenes. However, traditional machine learning approaches, particularly those based on correlation filtering (CF), remain valuable due to their efficiency in data collection, lower computational needs and improved explain ability. While CF-based tracking methods have become popular for analyzing complex scenes, they often rely on single features, limiting their ability to capture dynamic target appearances and resulting in inaccurate target tracking. Traditional template update techniques might also result in low accuracy and inaccurate feature extraction. In contrast, we introduces a location fusion mechanism incorporating multiple feature information streams to improve real-time monitoring in complex scenes. These strategies periodically extract four types of features and fuse their response maps, ensuring robust target tracking with high accuracy. Further innovations, such as dynamic spatial regularization and a multi-memory tracking framework, enable filters to focus on more reliable regions and suppress response deviations across consecutive frames. On the basis of confidence score a novel template update, storage and retrieval mechanism is implemented. Extensive testing across datasets like OTB100, VOT2016 and VOT2018 confirms that these integrated approaches outperform 26 state-of-the-art algorithms by balancing tracking success and computational efficiency in complex scenes. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. UWB-Based Real-Time Indoor Positioning Systems: A Comprehensive Review.
- Author
-
Al-Okby, Mohammed Faeik Ruzaij, Junginger, Steffen, Roddelkopf, Thomas, and Thurow, Kerstin
- Subjects
INDOOR positioning systems ,MOBILE apps ,POWER transmission ,MOBILE robots ,INTERNET of things - Abstract
Currently, the process of tracking moving objects and determining their indoor location is considered to be one of the most attractive applications that have begun to see widespread use, especially after the adoption of this technology in some smartphone applications. The great developments in electronics and communications systems have provided the basis for tracking and location systems inside buildings, so-called indoor positioning systems (IPSs). The ultra-wideband (UWB) technology is one of the important emerging solutions for IPSs. This radio communications technology provides important characteristics that distinguish it from other solutions, such as secure and robust communications, wide bandwidth, high data rate, and low transmission power. In this paper, we review the implementation of the most important real-time indoor positioning and tracking systems that use ultra-wideband technology for tracking and localizing moving objects. This paper reviews the newest in-market UWB modules and solutions, discussing several types of algorithms that are used by the real-time UWB-based systems to determine the location with high accuracy, along with a detailed comparison that saves the reader a lot of time and effort in choosing the appropriate UWB-module/method/algorithm for real-time implementation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.