Descriptor: "OBJECT recognition (Computer vision)" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"OBJECT recognition (Computer vision)"' showing total 16,077 results

Start Over Descriptor "OBJECT recognition (Computer vision)"

16,077 results on '"OBJECT recognition (Computer vision)"'

1. Mobile Augmented Reality Interface for Instruction-based Disaster Preparedness Guidelines.

Author: De León Aguilar, Sergio, Yuki Matsuda, and Keiichi Yasumoto
Subjects: EMERGENCY management, COLLEGE curriculum, INFORMATION science, OBJECT recognition (Computer vision), TELECOMMUNICATION, EARTHQUAKE resistant design, AUGMENTED reality
Abstract: The article offers an examination of augmented reality (AR)-assisted disaster preparedness guidelines designed to improve public awareness and engagement. Topics include the use of AR to enhance disaster preparedness by incorporating object recognition for environmental hazard identification, a comparison of AR-based guidelines with traditional paper-based ones in terms of usability and task performance, and the findings from testing these interfaces across different age groups.
Published: 2024
Full Text: View/download PDF

2. Traffic sign detection and recognition based on MMS data using YOLOv4-Tiny algorithm.

Author: Gezgin, Hilal and Alkan, Reha Metin
Subjects: *TRAFFIC monitoring, *IMAGE recognition (Computer vision), *OBJECT recognition (Computer vision), *TRAFFIC safety, *TRAFFIC signs & signals, *WEBCAMS
Abstract: Traffic signs have great importance in driving safety. For the recently emerging autonomous vehicles, that can automatically detect and recognize all road inventories such as traffic signs. Firstly, in this study, a method based on a mobile mapping system (MMS) is proposed for the detection of traffic signs to establish a Turkish traffic sign dataset. Obtaining images from real traffic scenes using the MMS method enhances the reliability of the model. It is an easy method to be applied to real life in terms of both cost and suitability for mobile and autonomous systems. In this frame, YOLOv4-Tiny, one of the object detection algorithms, that is considered to be more suitable for mobile vehicles, is used to detect and recognize traffic signs. This algorithm is low operation cost and more suitable for embedded devices due to its simple neural network structure compared to other algorithms. It is also a better option for real-time detection than other approaches. For the training of the model in the suggested method, a dataset consisting partly of images taken with MMS based on realistic field measurement and partly of images obtained from open data sets was used. This training resulted in the mean average precision (mAP) value being obtained as 98.1%. The trained model was first tested on existing images and then tested in real time in a laboratory environment using a simple fixed web camera. The test results show that the suggested method can improve driving safety by detecting traffic signs quickly and accurately, especially for autonomous vehicles. Therefore, the proposed method is considered suitable for use in autonomous vehicles. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Semantics-Fusion: Radar Semantic Information-Based Radar–Camera Fusion for 3D Object Detection.

Author: Tian, Ziran, Huang, Xiaohong, Xu, Kunqiang, Sun, Xujie, and Deng, Zhenmiao
Subjects: *OBJECT recognition (Computer vision), *FEATURE extraction, *POINT cloud, *DEEP learning, *RADAR
Abstract: The fusion of millimeter-wave radar and camera for three-dimensional (3D) object detection represents a pivotal technology for autonomous driving, yet it is not without its inherent challenges. First, the radar point cloud contains clutter, which can result in the generation of impure radar features. Second, the radar point cloud is sparse, which presents a challenge in fully extracting the radar features. This can result in the loss of object information, leading to object misdetection, omission, and a reduction in the robustness. To address these issues, a 3D object detection method based on the semantic information of radar features and camera fusion (Semantics-Fusion) is proposed. Initially, the image features are extracted through the centroid detection network, resulting in the generation of a preliminary 3D bounding box for the objects. Subsequently, the radar point cloud is clustered based on the objects’ position and velocity, thereby eliminating irrelevant point cloud and clutter. The clustered radar point cloud is projected onto the image plane, thereby forming a radar 2D pseudo-image. This is then input to the designed 2D convolution module, which enables the full extraction of the semantic information of the radar features. Ultimately, the radar features are fused with the image features, and secondary regression is employed to achieve robust 3D object detection. The performance of our method was evaluated on the nuScenes dataset, achieving a mean average precision (mAP) of 0.325 and a nuScenes detection score (NDS) of 0.462. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Tl-depth: monocular depth estimation based on tower connections and Laplacian-filtering residual completion.

Author: Zhang, Qi, Song, Yuqin, and Lou, Hui
Subjects: *MACHINE learning, *OBJECT recognition (Computer vision), *COMPUTER vision, *FEATURE extraction, *MONOCULARS
Abstract: Monocular depth estimation is essential in computer vision and robotics applications, including localization, mapping, and 3D object detection. In recent years, supervised learning algorithms that model large amounts of data have been successful in depth estimation. However, obtaining dense ground truth depth labels remains a challenge in supervised training. Therefore, unsupervised methods trained using monocular image sequences have gained wider attention. However, the depth estimation results of most existing models often produce blurred edges. Therefore, we propose various effective improvement strategies to construct a depth estimation network TL-Depth. (1) We propose a tower connection structure that utilizes convolutional processing to facilitate feature fusion, achieve precise semantic classification of pixels, and yield more accurate depth results. (2) We employ a Laplacian-filtering residual to focus on boundary information and enhance detailed results. (3) During the feature extraction stage, multiple pooling excitations are used by embedding them in the convolutional layer. This reduces redundant information while enhancing the network's feature extraction capability. The experimental results on the KITTI dataset and the Make3D dataset demonstrate that this method achieves good results compared to current methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. Detection and identification of centipedes based on deep learning.

Author: Chen, Weitao, Yao, Zhaoli, Wang, Tao, Yang, Fu, Zu, Weiwei, Yao, Chong, and Jia, Liangquan
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *CENTIPEDES, *RURAL industries, *AGRICULTURE
Abstract: The quantification of centipede populations is one of the key measures in achieving intelligent management of edible centipedes and promoting the upgrade of the rural centipede industry chain. However, current centipede counting techniques still face several challenges, including low detection accuracy, large model size, and difficulty in deployment on mobile devices. These challenges have limited existing network models to the experimental stage, preventing their practical application. To tackle the identified challenges, this study introduces a lightweight centipede detection model (FCM-YOLO), which enhances detection performance while ensuring fast processing and broad applicability. Based on the YOLOv5s framework, this model incorporates the C3FS module, resulting in fewer parameters and increased detection speed. Additionally, it integrates an attention module (CBAM) to suppress irrelevant information and improve target focus, thus enhancing detection accuracy. Furthermore, to enhance the precision of bounding box positioning, this study proposes a new loss function, CMPDIOU, for bounding box loss. Experimental results show that FCM-YOLO, while reducing parameter size, achieves an improved detection accuracy of 97.4% (2.7% higher than YOLOv5s) and reduces floating-point operations (FLOPs) to 11.5G (4.3G lower than YOLOv5s). In summary, this paper provides novel insights into the detection and enumeration of centipedes, contributing to the advancement of intelligent agricultural practices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Adaptive occlusion object detection algorithm based on OL-IoU.

Author: Guo, Baicang, Zhang, Hongyu, Wang, Huanhuan, Li, Xinwei, and Jin, Lisheng
Subjects: *OBJECT recognition (Computer vision), *GEOGRAPHICAL perception, *FEATURE extraction, *SPACE perception, *AUTONOMOUS vehicles, *INTELLIGENT transportation systems
Abstract: The continuous advancement of autonomous driving technology imposes higher demands on the accuracy of target detection in complex environments, particularly when traffic targets are occluded. Existing algorithms still face significant challenges in detection accuracy and real-time performance under such conditions. To address this issue, this paper proposes an improved YOLOX algorithm based on adaptive deformable convolution, named OCC-YOLOX. This algorithm enhances the feature extraction network's ability to focus on occluded targets by incorporating a coordinate attention mechanism. Additionally, it introduces the Overlapping IoU (OL-IoU) loss function to optimize the overlap between predicted and ground truth bounding boxes, thereby improving detection accuracy. Furthermore, the adoption of Fast Spatial Pyramid Pooling (Fast SPP) reduces computational complexity while maintaining real-time performance. Experiments on fused public datasets demonstrate that OCC-YOLOX achieves improvements in accuracy, recall, and average precision by 2.76%, 1.25%, and 1.92%, respectively. In addition to testing on the KITTI, CityPersons, and BDD100K datasets, the effectiveness of the OCC-YOLOX algorithm is further validated through comparisons with self-collected occlusion scene data. The experimental results indicate that OCC-YOLOX outperforms existing mainstream detection algorithms, particularly in handling complex occlusion scenarios, significantly enhancing the accuracy and efficiency of object detection. This study provides new insights for addressing the challenges of occluded target detection in intelligent transportation systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. Validation of Vetscan Imagyst®, a diagnostic test utilizing an artificial intelligence deep learning algorithm, for detecting strongyles and Parascaris spp. in equine fecal samples.

Author: Steuer, Ashley, Fritzler, Jason, Boggan, SaraBeth, Daniel, Ian, Cowles, Bobby, Penn, Cory, Goldstein, Richard, and Lin, Dan
Subjects: *MACHINE learning, *FECAL egg count, *OBJECT recognition (Computer vision), *ARTIFICIAL intelligence, *IMAGE recognition (Computer vision), *FECAL contamination, *DEEP learning
Abstract: Background: Current methods for obtaining fecal egg counts in horses are often inaccurate and variable depending on the analyst's skill and experience. Automated digital scanning of fecal sample slides integrated with analysis by an artificial intelligence (AI) algorithm is a viable, emerging alternative that can mitigate operator variation compared to conventional methods in companion animal fecal parasite diagnostics. Vetscan Imagyst is a novel fecal parasite detection system that uploads the scanned image to the cloud where proprietary software analyzes captured images for diagnostic recognition by a deep learning, object detection AI algorithm. The study describes the use and validation of Vetscan Imagyst in equine parasitology. Methods: The primary objective of the study was to evaluate the performance of the Vetscan Imagyst system in terms of diagnostic sensitivity and specificity in testing equine fecal samples (n = 108) for ova from two parasites that commonly infect horses, strongyles and Parascaris spp., compared to reference assays performed by expert parasitologists using a Mini-FLOTAC technique. Two different fecal flotation solutions were used to prepare the sample slides, NaNO3 and Sheather's sugar solution. Results: Diagnostic sensitivity of the Vetscan Imagyst algorithm for strongyles versus the manual reference test was 99.2% for samples prepared with NaNO3 solution and 100.0% for samples prepared with Sheather's sugar solution. Sensitivity for Parascaris spp. was 88.9% and 99.9%, respectively, for samples prepared with NaNO3 and Sheather's sugar solutions. Diagnostic specificity for strongyles was 91.4% and 99.9%, respectively, for samples prepared with NaNO3 and Sheather's sugar solutions. Specificity for Parascaris spp. was 93.6% and 99.9%, respectively, for samples prepared with NaNO3 and Sheather's sugar solutions. Lin's concordance correlation coefficients for VETSCAN IMAGYST eggs per gram counts versus those determined by the expert parasitologist were 0.924–0.978 for strongyles and 0.944–0.955 for Parascaris spp., depending on the flotation solution. Conclusions: Sensitivity and specificity results for detecting strongyles and Parascaris spp. in equine fecal samples showed that Vetscan Imagyst can consistently provide diagnostic accuracy equivalent to manual evaluations by skilled parasitologists. As an automated method driven by a deep learning AI algorithm, VETSCAN IMAGYST has the potential to avoid variations in analyst characteristics, thus providing more consistent results in a timely manner, in either clinical or laboratory settings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. FROM PIXELS TO PREDICTIONS: ROLE OF BOOSTED DEEP LEARNING-ENABLED OBJECT DETECTION FOR AUTONOMOUS VEHICLES ON LARGE SCALE CONSUMER ELECTRONICS ENVIRONMENT.

Author: ALKHONAINI, MIMOUNA ABDULLAH, MENGASH, HANAN ABDULLAH, NEMRI, NADHEM, EBAD, SHOUKI A., ALOTAIBI, FAIZ ABDULLAH, ALJABRI, JAWHARA, ALZAHRANI, YAZEED, and ALNFIAI, MRIM M.
Subjects: *OBJECT recognition (Computer vision), *SUSTAINABILITY, *CITY traffic, *SENSOR arrays, *SMART cities
Abstract: Consumer electronics (CE) companies have the potential to significantly contribute to the advancement of autonomous vehicles and their accompanying technology by providing security, connectivity, and efficiency. The Consumer Autonomous Vehicles market is set for significant growth, driven by growing awareness and implementation of sustainable practices using computing technologies for traffic flow optimization in smart cities. Businesses are concentrating more on eco-friendly solutions, using AI, communication networks, and sensors for autonomous city navigation, giving safer and more efficient mobility solutions in response to growing environmental concerns. Object detection is a crucial element of autonomous vehicles and complex systems, which enables them to observe and react to their surroundings in real-time. Multiple autonomous vehicles employ deep learning (DL) for detection and deploy specific sensor arrays custom-made to their use case or environment. DL processes sensory data for autonomous vehicles, enabling data-driven decisions on environmental reactions and obstacle recognition. This paper projects a Galactical Swarm Fractals Optimizer with DL-Enabled Object Detection for Autonomous Vehicles (GSODL-OOAV) model in Smart Cities. The presented GSODL-OOAV model enables the object identification for autonomous vehicles properly. To accomplish this, the GSODL-OOAV model initially employs a RetinaNet object detector to detect the objects effectively. Besides, the long short-term memory ensemble (BLSTME) technique was exploited to allot proper classes to the detected objects. A hyperparameter tuning procedure utilizing the GSO model is employed to enhance the classification efficiency of the BLSTME approach. The experimentation validation of the GSODL-OOAV technique is verified using the BDD100K database. The comparative study of the GSODL-OOAV approach illustrated a superior accuracy outcome of 99.06% over present innovative approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Underwater organisms detection algorithm based on multi‐scale perception and representation enhancement.

Author: Xu, Jiawei, Chen, Fen, Huang, Lian, Liu, Tingna, and Peng, Zongju
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *DEEP learning, *IMAGE intensifiers, *ALGORITHMS
Abstract: To address issues such as object‐background confusion and difficulties in multi‐scale object feature extraction in underwater scenarios, this article proposes an underwater organisms detection algorithm based on multi‐scale perception and representation enhancement. The key innovation of the proposed algorithm is the perception improvement of the deep learning model for underwater multi‐scale objects. First, for underwater large‐scale objects, omni‐dimensional dynamic convolution is embedded as an attention mechanism (AM) into the deep network to improve the network's sensitivity to large‐scale underwater objects. For underwater small‐scale objects, an information retention downsampling module is designed to reduce the effects of serious information loss. Then, a contextual transformer as an AM is introduced into shallow networks to strengthen the network's ability to extract features from small objects. The second innovation of the proposed algorithm is an underwater spatial pooling pyramid module which enhances the representation ability of the model. Furthermore, a lightweight decoupled head is designed to eliminate the conflict between classification and localization. The ablation experiment on the URPC dataset shows that the proposed models are effective for underwater object detection. The comparative experiments on the URPC and DUT‐USEG datasets demonstrate that the proposed algorithm achieves an advantage in detection performance compared with the mainstream detection algorithms and underwater detection algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. A lightweight object detection approach based on edge computing for mining industry.

Author: Hanif, Muhammad Wahab, Li, Zhanli, Yu, Zhenhua, and Bashir, Rehmat
Subjects: *EDGE detection (Image processing), *OBJECT recognition (Computer vision), *SAFETY hats, *SPATIAL filters, *EDGE computing
Abstract: Coal Mining enterprises deploy numerous monitoring devices to ensure safe and efficient production using target detection technologies. However, deploying deep detection models on edge devices poses challenges due to high computational loads, impacting detection speed and accuracy. A mining target detection dataset has been created to address these issues, featuring key targets in coal mining scenes such as miners, safety helmets, and coal gangue. A model is proposed to improve real‐time performance for edge mining detection tasks. Detection performance is enhanced by incorporating a Pixel‐wise Normalization Spatial Attention Module (PN‐SAM) into the MobileNet‐v3 bneck structure and replacing the h‐swish activation function with Mish, providing more prosperous gradient information transfer. The proposed model, YOLO‐v4‐LSAM, shows a 3.2% mAP improvement on the VOC2012 dataset and a 2.4% improvement on the mining target dataset compared to YOLO‐v4‐Tiny, demonstrating its effectiveness in mining environments. These enhancements enable more accurate and efficient detection in resource‐constrained edge environments, contributing to safer and more reliable monitoring in coal mining operations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Swin‐fisheye: Object detection for fisheye images.

Author: Zhang, Dawei, Yang, Tingting, and Zhao, Bokai
Subjects: *OBJECT recognition (Computer vision), *TRANSFORMER models, *COMPUTER vision, *CAMERAS, *ALGORITHMS
Abstract: Fisheye cameras have been widely used in autonomous navigation, visual surveillance, and automatic driving. Due to severe geometric distortion, fisheye images cannot be processed effectively by conventional methods. The existing object detection algorithms cannot better detect the small targets or the objects with large distortion in the fisheye images. The size and scene of available fisheye datasets (such as WoodScape and VOC‐360) cannot satisfy the training of robust network models. Herein, the authors propose Swin‐Fisheye, an end‐to‐end object detection algorithm based on Swin Transformer. A feature pyramid module based on deformable convolution (DFPM) is designed to obtain richer contextual information from the multi‐scale feature maps. In addition, a projection transformation algorithm (PTA) is proposed, which can convert rectilinear images into fisheye images more accurately, and then create a fisheye image dataset (COCO‐Fish). The results of extensive experiments conducted on VOC‐360, WoodScape, and COCO‐Fish demonstrate that the proposed algorithm can achieve satisfactory results compared with state‐of‐the‐art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Improving YOLOv8 with parallel frequency channel attention for taxi passengers.

Author: Gao, Qi, He, Di, and Xu, Guilin
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *TAXICAB industry, *DEEP learning, *ALGORITHMS
Abstract: Detecting taxi passengers is crucial for assessing taxi driver behavior, which plays a significant role in regulating the taxi industry. Despite the advancements in deep learning, object detection algorithms have not been extensively applied to this domain. In this article, an innovative taxi passenger detection algorithm is introduced based on YOLOv8, a lightweight and highly accurate method designed to automatically monitor driver behavior and regulate the taxi industry. To address the challenge of deploying complex object detection models on mobile devices, the ghost module is incorporated in place of standard convolutions within the C2f module, thereby making the model more lightweight. Furthermore, the model's performance is enhanced by integrating an improved version of Frequency Channel Attention (FCA), termed Parallel Frequency Channel Attention (PFCA), which boosts detection accuracy with minimal additional parameters and computational overhead. Experimental results on a specific taxi passenger dataset demonstrate that the proposed method significantly outperforms the baseline YOLOv8n model. Specifically, the model reduces the number of parameters and floating point operations by 12.96% and 8.18%, respectively, while achieving increases in mAP50 and mAP50‐95 by 0.27 and 0.73 percentage points, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Wheat leaf localization and segmentation for yellow rust disease detection in complex natural backgrounds.

Author: Hassan, Amna, Mumtaz, Rafia, Mahmood, Zahid, Fayyaz, Muhammad, and Naeem, Muhammad Kashif
Subjects: CONVOLUTIONAL neural networks, OBJECT recognition (Computer vision), TRANSFORMER models, STRIPE rust, RUST diseases
Abstract: Wheat yellow rust disease poses a significant threat to global wheat yield and grain quality. Early detection of this disease will help to minimize the loss caused by its effects. Existing models work well on images taken in a controlled environment, whereas a uniform background is placed behind the leaf, but these models fail to produce good results in natural settings. Previous research also involves manual interventions in the pipeline to achieve good classification results such as cropping the images, using uniform backgrounds, etc. These systems are not practical to use in natural environments where there will be a lot of background noise to the image and manual cropping becomes an extra step for the farmer. Moreover, the unavailability of the dataset in which images of leaves are taken in a natural setting became another challenge. In this research, a dataset is curated and leaves are annotated for object detection, object segmentation further the leaves are classified into 3 classes ie healthy, resistant, and susceptible. A novel unsupervised image rotation algorithm is proposed that takes input from YOLOv8 to align the leave in such a way that maximum background can be removed by a rectangular bounding box. Then the comparison between multiple state-of-the-art segmentation models ie. UNET, Segment-Anything (SAM), Segnet, LinkNet, PSPNet, FPN, Deep-Labv3+ (Xception), and DeepLabv3+ (Mo-bileNet) has shown that UNET has outperformed all the other segmentation models with an IOU score of 0.9563. Lastly for classification, the performance of multiple convolution neural networks ie. VGG16, Resnet 101(v2), Xception, Mo-bileNetV2, and Transformer-based models ie. Swin trans-former and MobileVit have been compared. Swin transformer has outperformed the state-of-the-art CNN models with an accuracy of 95.8%. This paper proposes a complete robust pipeline that can be deployed in natural environment and does not need any manual intervention to produce good results. This research shows that good localization of leaves and removal of unwanted background noise at the earliest stage of the pipeline will assist the segmentation model to effectively segment the leaf from the background which will enable classification models to achieve high classification accuracy, even when dealing with very small datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. YOLO-Fusion and Internet of Things: Advancing object detection in smart transportation.

Author: Tang, Jun, Ye, Caixian, Zhou, Xianlai, and Xu, Lijun
Subjects: REAL-time computing, OBJECT recognition (Computer vision), SMART cities, URBAN transportation, INTERNET of things, INTELLIGENT transportation systems
Abstract: In intelligent transportation systems, traditional object detection algorithms struggle to handle complex environments and varying lighting conditions, particularly when detecting small targets and processing multimodal data. Furthermore, existing IoT frameworks are limited in their efficiency for real-time data collection and processing, leading to data transmission delays and increased resource consumption, which constrains the overall performance of intelligent transportation systems. To address these issues, this paper proposes a novel deep learning model, YOLO-Fusion. Based on the YOLOv8 architecture, this model innovatively integrates infrared and visible-light images, utilizing FusionAttention and Dynamic Fusion modules to optimize the fusion of multimodal information. To further enhance detection performance, this paper designs a Fusion-Dynamic Loss, improving the model's performance in complex intelligent transportation scenarios. To support the efficient operation of YOLO-Fusion, this paper also introduces an IoT framework that uses intelligent sensors and edge computing technology to achieve real-time collection, transmission and processing of traffic data, significantly improving data timeliness and accuracy. Experimental results demonstrate that YOLO-Fusion significantly outperforms traditional methods on the DroneVehicle and FLIR datasets, showcasing its broad application potential in intelligent traffic monitoring and management. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Deep learning based identification and tracking of railway bogie parts.

Author: Shaikh, Muhammad Zakir, Ahmed, Zeeshan, Baro, Enrique Nava, Hussain, Samreen, and Milanova, Mariofanna
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, RAILROAD safety measures, DEEP learning, ROLLING stock, BOGIES (Vehicles)
Abstract: The Train Rolling-Stock Examination (TRSE) is a safety examination process that physically examines the bogie parts of a moving train, typically at speeds over 30 km/h. Currently, this inspection process is done manually by railway personnel in many countries to ensure safety and prevent interruptions to rail services. Although many earlier attempts have been made to semi-automate this process through computer-vision models, these models are iterative and still require manual intervention. Consequently, these attempts were unsuitable for real-time implementations. In this work, we propose a detection model by utilizing a deep-learning based classifier that can precisely identify bogie parts in real-time without manual intervention, allowing an increase in the deployability of these inspection systems. We implemented the Anchor-Free Yolov8 (AFYv8) model, which has a decoupled-head module for recognizing bogie parts. Additionally, we incorporated bogie parts tracking with the AFYv8 model to gather information about any missing parts. To test the effectiveness of the AFYv8-model, the bogie videos were captured at three different timestamps and the result shows the increase in the recognition accuracy of TRSE by 10 % compared to the previously developed classifiers. This research has the potential to enhance railway safety and minimize operational interruptions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. A semantic visual SLAM towards object selection and tracking optimization.

Author: Sun, Tian, Cheng, Lei, Hu, Yaqi, Yuan, Xiaoping, and Liu, Yong
Subjects: OBJECT recognition (Computer vision), OPTIMIZATION algorithms, IMAGE segmentation, NAVIGATION
Abstract: Simultaneous localization and mapping (SLAM) technology has garnered considerable attention as a pivotal component for the autonomous navigation of intelligent mobile vehicles. Integrating target detection and target tracking technology into SLAM enhances scene perception, resulting in a more resilient SLAM system. Consequently, this article presents a pose optimization algorithm based on image segmentation, coupled with object detection technology, to achieve superior multi-frame association feature matching. Subsequently, this paper proposes a method for selecting the most stable targets to better conduct pose optimization. Finally, experimental validation was conducted on five sequences from the TUM dataset. We conducted tracking performance experiments to demonstrate the necessity of selecting stable targets for pose optimization. Afterwards, we carried out a comprehensive comparison with the current state-of-the-art SLAM implementations in terms of accuracy and robustness. The average absolute trajectory error of our method in the dynamic benchmark datasets is ∼ 94.14% lower than that of ORB-SLAM2, ∼ 61.90% lower than that of RS-SLAM, and ∼ 80.89% lower than that of DS-SLAM. At the end of the experiment, the process performance of the proposed method is demonstrated. The experiments collectively showcase the system's capability to deliver outstanding results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Simulation and real-life implementation of UAV autonomous landing system based on object recognition and tracking for safe landing in uncertain environments.

Author: Baidya, Ranjai and Jeong, Heon
Subjects: OBJECT recognition (Computer vision), PID controllers, DRONE aircraft, EUCLIDEAN distance, VISUAL training
Abstract: The use of autonomous Unmanned Aerial Vehicles (UAVs) has been increasing, and the autonomy of these systems and their capabilities in dealing with uncertainties is crucial. Autonomous landing is pivotal for the success of an autonomous mission of UAVs. This paper presents an autonomous landing system for quadrotor UAVs with the ability to perform smooth landing even in undesirable conditions like obstruction by obstacles in and around the designated landing area and inability to identify or the absence of a visual marker establishing the designated landing area. We have integrated algorithms like version 5 of You Only Look Once (YOLOv5), DeepSORT, Euclidean distance transform, and Proportional-Integral-Derivative (PID) controller to strengthen the robustness of the overall system. While the YOLOv5 model is trained to identify the visual marker of the landing area and some common obstacles like people, cars, and trees, the DeepSORT algorithm keeps track of the identified objects. Similarly, using the detection of the identified objects and Euclidean distance transform, an open space without any obstacles to land could be identified if necessary. Finally, the PID controller generates appropriate movement values for the UAV using the visual cues of the target landing area and the obstacles. To warrant the validity of the overall system without risking the safety of the involved people, initial tests are performed, and a software-based simulation is performed before executing the tests in real life. A full-blown hardware system with an autonomous landing system is then built and tested in real life. The designed system is tested in various scenarios to verify the effectiveness of the system. The code is available at this repository: https://github.com/rnjbdya/Vision-based-UAV-autonomous-landing. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Deep learning-assisted object recognition with hybrid triboelectric-capacitive tactile sensor.

Author: Xie, Yating, Cheng, Hongyu, Yuan, Chaocheng, Zheng, Limin, Peng, Zhengchun, and Meng, Bo
Subjects: TACTILE sensors, RECOGNITION (Psychology), MATERIALS texture, SURFACE texture, DEEP learning, OBJECT recognition (Computer vision)
Abstract: Tactile sensors play a critical role in robotic intelligence and human-machine interaction. In this manuscript, we propose a hybrid tactile sensor by integrating a triboelectric sensing unit and a capacitive sensing unit based on porous PDMS. The triboelectric sensing unit is sensitive to the surface material and texture of the grasped objects, while the capacitive sensing unit responds to the object's hardness. By combining signals from the two sensing units, tactile object recognition can be achieved among not only different objects but also the same object in different states. In addition, both the triboelectric layer and the capacitor dielectric layer were fabricated through the same manufacturing process. Furthermore, deep learning was employed to assist the tactile sensor in accurate object recognition. As a demonstration, the identification of 12 samples was implemented using this hybrid tactile sensor, and an recognition accuracy of 98.46% was achieved. Overall, the proposed hybrid tactile sensor has shown great potential in robotic perception and tactile intelligence. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Deep learning networks-based tomato disease and pest detection: a first review of research studies using real field datasets.

Author: Jelali, Mohieddine
Subjects: ARTIFICIAL neural networks, TOMATO diseases & pests, CONVOLUTIONAL neural networks, OBJECT recognition (Computer vision), AGRICULTURE, DEEP learning
Abstract: Recent advances in deep neural networks in terms of convolutional neural networks (CNNs) have enabled researchers to significantly improve the accuracy and speed of object recognition systems and their application to plant disease and pest detection and diagnosis. This paper presents the first comprehensive review and analysis of deep learning approaches for disease and pest detection in tomato plants, using self-collected field-based and benchmarking datasets extracted from real agricultural scenarios. The review shows that only a few studies available in the literature used data from real agricultural fields such as the PlantDoc dataset. The paper also reveals overoptimistic results of the huge number of studies in the literature that used the PlantVillage dataset collected under (controlled) laboratory conditions. This finding is consistent with the characteristics of the dataset, which consists of leaf images with a uniform background. The uniformity of the background images facilitates object detection and classification, resulting in higher performance-metric values for the models. However, such models are not very useful in agricultural practice, and it remains desirable to establish large datasets of plant diseases under real conditions. With some of the self-generated datasets from real agricultural fields reviewed in this paper, high performance values above 90% can be achieved by applying different (improved) CNN architectures such as Faster R-CNN and YOLO. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. Design of apple recognition model based on improved deep learning object detection framework Faster-RCNN.

Author: Zhao, Qinghua and Liu, Yaqiu
Subjects: *CONVOLUTIONAL neural networks, *OBJECT recognition (Computer vision), *FORESTS & forestry, *HIGH technology industries, *DEEP learning, ECONOMIC conditions in China
Abstract: It the stage of digital economy development in China, intelligent recognition technology is used in agriculture, forestry and planting industries. This paper improves and optimizes apple recognition based on Faster-RCNN, a deep learning target detection framework, and analyzes the advantages and disadvantages of the improved target detection and recognition model. This study compares and analyzes the conventional Faster-RCNN apple recognition model and the enhanced Faster-RCNN apple recognition model design based on the deep learning target detection framework. From the two groups of recognition model systems under different design schemes, the research and analysis of comprehensive performance, sensitivity and coupling shows that the improved Faster-RCNN Apple recognition model has higher accuracy and more accurate positioning for Apple recognition. The results show that the improved Faster-RCNN apple recognition model can more effectively improve the design of the apple recognition model at the present stage, reduce labor costs, and significantly improve work efficiency. It also promotes the development of the domestic digital economy and provides research support and value for future intelligence and deep learning development paths. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. WCAY object detection of fractures for X-ray images of multiple sites.

Author: Chen, Peng, Liu, Songyan, Lu, Wenbin, Lu, Fangpeng, and Ding, Boyang
Subjects: *OBJECT recognition (Computer vision), *X-ray imaging, *DEEP learning, *X-ray detection, *MULTIPLE comparisons (Statistics)
Abstract: The WCAY (weighted channel attention YOLO) model, which is meticulously crafted to identify fracture features across diverse X-ray image sites, is presented herein. This model integrates novel core operators and an innovative attention mechanism to enhance its efficacy. Initially, leveraging the benefits of dynamic snake convolution (DSConv), which is adept at capturing elongated tubular structural features, we introduce the DSC-C2f module to augment the model's fracture detection performance by replacing a portion of C2f. Subsequently, we integrate the newly proposed weighted channel attention (WCA) mechanism into the architecture to bolster feature fusion and improve fracture detection across various sites. Comparative experiments were conducted, to evaluate the performances of several attention mechanisms. These enhancement strategies were validated through experimentation on public X-ray image datasets (FracAtlas and GRAZPEDWRI-DX). Multiple experimental comparisons substantiated the model's efficacy, demonstrating its superior accuracy and real-time detection capabilities. According to the experimental findings, on the FracAtlas dataset, our WCAY model exhibits a notable 8.8% improvement in mean average precision (mAP) over the original model. On the GRAZPEDWRI-DX dataset, the mAP reaches 64.4%, with a detection accuracy of 93.9% for the "fracture" category alone. The proposed model represents a substantial improvement over the original algorithm compared to other state-of-the-art object detection models. The code is publicly available at https://github.com/cccp421/Fracture-Detection-WCAY. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Documenting customary land boundaries using unmanned aerial vehicle imagery and artificial intelligence.

Author: Abeho, Dianah Rose, Shoko, Moreblessings, and Odera, Patroba Achola
Subjects: *CONVOLUTIONAL neural networks, *MACHINE learning, *OBJECT recognition (Computer vision), *CADASTRAL maps, *COMPUTER vision, *DEEP learning
Abstract: The use of computer vision and deep learning in boundary documentation for land registration stems from the ongoing demand for appropriate mapping approaches of unregistered land rights to eradicate the global challenge of tenure insecurity. Previous research has yielded promising results towards automated extraction of photo‐visible cadastral boundaries from high‐resolution imagery. Nonetheless, the extraction of invisible cadastral boundaries is still a challenge. This study investigates the place of sensor/s on‐board unmanned aerial vehicles and deep learning algorithms in detecting cadastral boundaries. It develops a participatory boundary marking procedure using low‐cost markers to bring monument to previously invisible and ill‐defined cadastral boundaries. After that, the researchers trained and tested the accuracy of a convolutional neural network, namely single shot multi‐box detector (SSD) based on Residual Neural Network (ResNet) and Visual Geometry Group (VGG) backbone networks to automatically detect cadastral boundary markers from unmanned aerial vehicle imagery. SSD based on ResNet34 performed best with 0.88 precision, 0.92 recall and 0.91 F measure or (F1) score. VGG19‐based SSD yielded a precision of 0.47, recall of 0.53 and F1 score of 0.50. The horizontal accuracy of the cadastral map generated varied from 0.089 to 0.496 m per parcel, with a standard deviation of 0.120 m. Results show that this approach is practical for cadastral mapping in rural areas. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. An in-air synthetic aperture sonar dataset of target scattering in environments of varying complexity.

Author: Blanford, Thomas E., Williams, David P., Park, J. Daniel, Reinhardt, Brian T., Dalton, Kyle S., Johnson, Shawn F., and Brown, Daniel C.
Subjects: MACHINE learning, SYNTHETIC apertures, OBJECT recognition (Computer vision), SONAR, TIME series analysis
Abstract: This paper describes a synthetic aperture sonar (SAS) dataset collected in-air consisting of four types of targets in four environments of different complexity. The in-air laboratory based experiments produced data with a level of fidelity and ground truth accuracy that is not easily attainable in data collected underwater. The range of complexity, high level of data fidelity, and accurate ground truth provides a rich dataset with acoustic features on multiple scales. It can be used to develop new signal-processing and image reconstruction algorithms, as well as machine learning models for object detection and classification. It may also find application in model verification and validation for acoustic simulators. The dataset consists of raw acoustic time series returns, associated environmental conditions, hardware configuration, array motion, as well as the reconstructed imagery. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Evaluasi Trade-off Akurasi dan Kecepatan YOLOv5 dalam Deteksi Kebakaran pada Edge Devices.

Author: Setiawan, Rahmad Arif and Setyanto, Arief
Subjects: *OBJECT recognition (Computer vision), *FIRE detectors, *ALGORITHMS, *QUANTIZATION (Physics), *COMPUTER vision
Abstract: Real-time object detection using the YOLO (You Only Look Once) algorithm has shown promising performance in various computer vision applications. However, its application on devices with limited resources is still a challenge due to its high computational requirements. This study aims to optimize the YOLOv5 model for fire and smoke detection on Orange Pi Zero 3 devices using quantization techniques. Using a dataset of 2247 fire and smoke images, this study applies static quantization techniques to improve model efficiency. The methodology includes training of standard YOLOv5 models, conversion to ONNX format, and application of static quantization. Results show a significant improvement in computational efficiency, with a 42.2% reduction in model size and a 65.21% increase in inference speed. Despite a decrease in the mAP value by 25.6%, the optimized model was still able to perform object detection at a significantly higher speed. In conclusion, the quantization technique is effective in optimizing the YOLOv5 model for deployment on edge computing devices, despite the trade-off between speed and accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Lightweight ViT with Multiscale Feature Fusion for Driving Risk Rating Warning System.

Author: Tang, Hao, Xu, Xixi, Xu, Haiyang, Liu, Shuang, Ji, Jie, Qiu, Chengqun, and Shen, Yujie
Subjects: *OBJECT recognition (Computer vision), *COMPUTER vision, *AUTONOMOUS vehicles, *ALGORITHMS, *NECK
Abstract: Addressing the issue of inadequate dynamic object detection accuracy in current road driving warning systems, this study proposes the RepBF‐YOLOv8 detection algorithm aimed at efficient risk identification. The backbone network of YOLOv8n is replaced with the lightweight RepViT architecture, which is more suitable for visual tasks. This replacement simplifies the traditional structure, reduces the complexity of the backbone network, maximizes performance enhancement, and minimizes latency. Additionally, the FPN in the neck section is upgraded to Bi‐FPN, which reduces nodes and span connections and incorporates rapid normalization to achieve fast multi‐scale feature fusion. For risk grading, the algorithm infers distances and collision times, categorizing detected objects into high, medium, and low‐risk levels, and uses different colors to warn the driver. Comparative experimental results show that the optimized algorithm improves Precision by 1.7%, Recall by 2.3%, mAP@0.5 by 1.53%, and mAP@0.5:0.95 by 2.91%. In road tests, the risk warning system achieves a frame detection rate ranging from a minimum of 38.4 fps to a maximum of 59.0 fps. The detection confidence for various objects remains above 0.71, reaching as high as 0.98. Specifically, the "Car" confidence ranges from 0.81 to 0.98, demonstrating the accuracy and robustness of vehicle risk detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Positioning of mango picking point using an improved YOLOv8 architecture with object detection and instance segmentation.

Author: Li, Hongwei, Huang, Jianzhi, Gu, Zenan, He, Deqiang, Huang, Junduan, and Wang, Chenglin
Subjects: *OBJECT recognition (Computer vision), *IMAGE segmentation, *FRUIT harvesting, *FRUIT, *SKELETON
Abstract: Positioning of mango picking points is a crucial technology for the realisation of automated robotic mango harvesting. Herein, this study reported a visualised end-to-end system for mango picking point positioning using improved YOLOv8 architecture with object detection and instance segmentation, as well as an algorithm of picking point positioning. At first, the improved YOLOv8n model, incorporating the BiFPN structure and the SPD-Conv module, was utilised to enhance the detection performance of mango fruits and stems. This model achieved a detection precision of 98.9% in fruits and 97.1% in stems, with recall of 99.5% and 94.6% respectively. Then, the YOLOv8n-seg model was used for segment the stem ROI (Region of interest), leading to 81.85% in MIoU and 88.69% in mPA. Finally, a skeleton line of the stem region was obtained on the basis of the segmentation image, and a picking point positioning algorithm was developed to determine the coordinates of the optimal picking point. Subsequently, the positioning success rate of coordinates, absolute errors, and relative errors were calculated by comparing the automatic positioned coordinates with the manually positioned stem region. Experimental results indicated that this study achieved an average positioning success rate of 92.01%, with an average absolute error of 4.93 pixels and an average relative error of 13.11%. Additionally, the average processing time for processing 640 images using the picking point positioning system is 72.75 ms. This study demonstrates the reliability and effectiveness of positioning mango picking points, laying the technological basis for the automated harvesting of mango fruits. • Simultaneous detection of mango fruits and fruiting stems. • A picking point positioning algorithm is proposed based on instance segmentation. • Development of an end-to-end mango picking point positioning system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Deep-Learning-Based Automated Building Construction Progress Monitoring for Prefabricated Prefinished Volumetric Construction.

Author: Chua, Wei Png and Cheah, Chien Chern
Subjects: *OBJECT recognition (Computer vision), *BUILDING sites, *BUILDING design & construction, *CONSTRUCTION projects, *COMPUTER vision
Abstract: Prefabricated prefinished volumetric construction (PPVC) is a relatively new technique that has recently gained popularity for its ability to improve flexibility in scheduling and resource management. Given the modular nature of PPVC assembly and the large amounts of visual data amassed throughout a construction project today, PPVC building construction progress monitoring can be conducted by quantifying assembled PPVC modules within images or videos. As manually processing high volumes of visual data can be extremely time consuming and tedious, building construction progress monitoring can be automated to be more efficient and reliable. However, the complex nature of construction sites and the presence of nearby infrastructure could occlude or distort visual data. Furthermore, imaging constraints can also result in incomplete visual data. Therefore, it is hard to apply existing purely data-driven object detectors to automate building progress monitoring at construction sites. In this paper, we propose a novel 2D window-based automated visual building construction progress monitoring (WAVBCPM) system to overcome these issues by mimicking human decision making during manual progress monitoring with a primary focus on PPVC building construction. WAVBCPM is segregated into three modules. A detection module first conducts detection of windows on the target building. This is achieved by detecting windows within the input image at two scales by using YOLOv5 as a backbone network for object detection before using a window detection filtering process to omit irrelevant detections from the surrounding areas. Next, a rectification module is developed to account for missing windows in the mid-section and near-ground regions of the constructed building that may be caused by occlusion and poor detection. Lastly, a progress estimation module checks the processed detections for missing or excess information before performing building construction progress estimation. The proposed method is tested on images from actual construction sites, and the experimental results demonstrate that WAVBCPM effectively addresses real-world challenges. By mimicking human inference, it overcomes imperfections in visual data, achieving higher accuracy in progress monitoring compared to purely data-driven object detectors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Efficient Small Object Detection You Only Look Once: A Small Object Detection Algorithm for Aerial Images.

Author: Luo, Jie, Liu, Zhicheng, Wang, Yibo, Tang, Ao, Zuo, Huahong, and Han, Ping
Subjects: *OBJECT recognition (Computer vision), *DRONE aircraft, *DATA mining, *INFORMATION networks, *ALGORITHMS
Abstract: Aerial images have distinct characteristics, such as varying target scales, complex backgrounds, severe occlusion, small targets, and dense distribution. As a result, object detection in aerial images faces challenges like difficulty in extracting small target information and poor integration of spatial and semantic data. Moreover, existing object detection algorithms have a large number of parameters, posing a challenge for deployment on drones with limited hardware resources. We propose an efficient small-object YOLO detection model (ESOD-YOLO) based on YOLOv8n for Unmanned Aerial Vehicle (UAV) object detection. Firstly, we propose that the Reparameterized Multi-scale Inverted Blocks (RepNIBMS) module is implemented to replace the C2f module of the Yolov8n backbone extraction network to enhance the information extraction capability of small objects. Secondly, a cross-level multi-scale feature fusion structure, wave feature pyramid network (WFPN), is designed to enhance the model's capacity to integrate spatial and semantic information. Meanwhile, a small-object detection head is incorporated to augment the model's ability to identify small objects. Finally, a tri-focal loss function is proposed to address the issue of imbalanced samples in aerial images in a straightforward and effective manner. In the VisDrone2019 test set, when the input size is uniformly 640 × 640 pixels, the parameters of ESOD-YOLO are 4.46 M, and the average mean accuracy of detection reaches 29.3%, which is 3.6% higher than the baseline method YOLOv8n. Compared with other detection methods, it also achieves higher detection accuracy with lower parameters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. DeployFusion: A Deployable Monocular 3D Object Detection with Multi-Sensor Information Fusion in BEV for Edge Devices.

Author: Huang, Fei, Liu, Shengshu, Zhang, Guangqian, Hao, Bingsen, Xiang, Yangkai, and Yuan, Kun
Subjects: *OBJECT recognition (Computer vision), *MULTISENSOR data fusion, *FEATURE extraction, *TRANSFORMER models, *POINT cloud
Abstract: To address the challenges of suboptimal remote detection and significant computational burden in existing multi-sensor information fusion 3D object detection methods, a novel approach based on Bird's-Eye View (BEV) is proposed. This method utilizes an enhanced lightweight EdgeNeXt feature extraction network, incorporating residual branches to address network degradation caused by the excessive depth of STDA encoding blocks. Meantime, deformable convolution is used to expand the receptive field and reduce computational complexity. The feature fusion module constructs a two-stage fusion network to optimize the fusion and alignment of multi-sensor features. This network aligns image features to supplement environmental information with point cloud features, thereby obtaining the final BEV features. Additionally, a Transformer decoder that emphasizes global spatial cues is employed to process the BEV feature sequence, enabling precise detection of distant small objects. Experimental results demonstrate that this method surpasses the baseline network, with improvements of 4.5% in the NuScenes detection score and 5.5% in average precision for detection objects. Finally, the model is converted and accelerated using TensorRT tools for deployment on mobile devices, achieving an inference time of 138 ms per frame on the Jetson Orin NX embedded platform, thus enabling real-time 3D object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. PE-MCAT: Leveraging Image Sensor Fusion and Adaptive Thresholds for Semi-Supervised 3D Object Detection.

Author: Li, Bohao, Song, Shaojing, and Ai, Luxia
Subjects: *OBJECT recognition (Computer vision), *IMAGE fusion, *ARTIFICIAL intelligence, *IMAGE sensors, *POINT cloud
Abstract: Existing 3D object detection frameworks in sensor-based applications heavily rely on large-scale annotated data to achieve optimal performance. However, obtaining such annotations from sensor data—like LiDAR or image sensors—is both time-consuming and costly. Semi-supervised learning offers an efficient solution to this challenge and holds significant potential for sensor-driven artificial intelligence (AI) applications. While it reduces the need for labeled data, semi-supervised learning still depends on a small amount of labeled samples for training. In the initial stages, relying on such limited samples can adversely affect the effective training of student–teacher networks. In this paper, we propose PE-MCAT, a semi-supervised 3D object detection method that generates high-precision pseudo-labels. First, to address the challenges of insufficient local feature capture and poor robustness in point cloud data, we introduce a point enrichment module. This module incorporates information from image sensors and combines multiple feature fusion methods of local and self-features to directly enhance the quality of point clouds and pseudo-labels, compensating for the limitations posed by using only a few labeled samples. Second, we explore the relationship between the teacher network and the pseudo-labels it generates. We propose a multi-class adaptive threshold strategy to initially filter and create a high-quality pseudo-label set. Furthermore, a joint variable threshold strategy is introduced to refine this set further, enhancing the selection of superior pseudo-labels.Extensive experiments demonstrate that PE-MCAT consistently outperforms recent state-of-the-art methods across different datasets. Specifically, on the KITTI dataset and using only 2% of labeled samples, our method improved the mean Average Precision (mAP) by 0.7% for cars, 3.7% for pedestrians, and 3.0% for cyclists. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Research on Deep Learning Detection Model for Pedestrian Objects in Complex Scenes Based on Improved YOLOv7.

Author: Hu, Jun, Zhou, Yongqi, Wang, Hao, Qiao, Peng, and Wan, Wenwei
Subjects: *OBJECT recognition (Computer vision), *AUTONOMOUS robots, *PEDESTRIANS, *AUTONOMOUS vehicles, *DETECTORS, *FEATURE extraction
Abstract: Objective: Pedestrian detection is very important for the environment perception and safety action of intelligent robots and autonomous driving, and is the key to ensuring the safe action of intelligent robots and auto assisted driving. Methods: In response to the characteristics of pedestrian objects occupying a small image area, diverse poses, complex scenes and severe occlusion, this paper proposes an improved pedestrian object detection method based on the YOLOv7 model, which adopts the Convolutional Block Attention Module (CBAM) attention mechanism and Deformable ConvNets v2 (DCNv2) in the two Efficient Layer Aggregation Network (ELAN) modules of the backbone feature extraction network. In addition, the detection head is replaced with a Dynamic Head (DyHead) detector head with an attention mechanism; unnecessary background information around the pedestrian object is also effectively excluded, making the model learn more concentrated feature representations. Results: Compared with the original model, the log-average miss rate of the improved YOLOv7 model is significantly reduced in both the Citypersons dataset and the INRIA dataset. Conclusions: The improved YOLOv7 model proposed in this paper achieved good performance improvement in different pedestrian detection problems. The research in this paper has important reference significance for pedestrian detection in complex scenes such as small, occluded and overlapping objects. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. An Energy-Efficient Dynamic Feedback Image Signal Processor for Three-Dimensional Time-of-Flight Sensors.

Author: Kim, Yongsoo, So, Jaehyeon, Hwang, Chanwook, Cheng, Wencan, and Ko, Jong Hwan
Subjects: *OBJECT recognition (Computer vision), *ARTIFICIAL intelligence, *ENERGY consumption, *AUGMENTED reality, *THREE-dimensional imaging
Abstract: With the recent prominence of artificial intelligence (AI) technology, various research outcomes and applications in the field of image recognition and processing utilizing AI have been continuously emerging. In particular, the domain of object recognition using 3D time-of-flight (ToF) sensors has been actively researched, often in conjunction with augmented reality (AR) and virtual reality (VR). However, for more precise analysis, high-quality images are required, necessitating significantly larger parameters and computations. These requirements can pose challenges, especially in developing AR and VR technologies for low-power portable devices. Therefore, we propose a dynamic feedback configuration image signal processor (ISP) for 3D ToF sensors. The ISP achieves both accuracy and energy efficiency through dynamic feedback. The proposed ISP employs dynamic area extraction to perform computations and post-processing only for pixels within the valid area used by the application in each frame. Additionally, it uses dynamic resolution to determine and apply the appropriate resolution for each frame. This approach enhances energy efficiency by avoiding the processing of all sensor data while maintaining or surpassing accuracy levels. Furthermore, These functionalities are designed for hardware-efficient implementation, improving processing speed and minimizing power consumption. The results show a maximum performance of 178 fps and a high energy efficiency of up to 123.15 fps/W. When connected to the hand pose estimation (HPE) accelerator, it demonstrates an average mean squared error (MSE) of 10.03 mm, surpassing the baseline ISP value of 20.25 mm. Therefore, the proposed ISP can be effectively utilized in low-power, small form-factor devices. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Facial Expression Recognition-You Only Look Once-Neighborhood Coordinate Attention Mamba: Facial Expression Detection and Classification Based on Neighbor and Coordinates Attention Mechanism.

Author: Peng, Cheng, Sun, Mingqi, Zou, Kun, Zhang, Bowen, Dai, Genan, and Tsoi, Ah Chung
Subjects: *OBJECT recognition (Computer vision), *RECOGNITION (Psychology), *FACIAL expression, *IMAGE processing, *NEIGHBORHOODS
Abstract: In studying the joint object detection and classification problem for facial expression recognition (FER) deploying the YOLOX framework, we introduce a novel feature extractor, called neighborhood coordinate attention Mamba (NCAMamba) to substitute for the original feature extractor in the Feature Pyramid Network (FPN). NCAMamba combines the background information reduction capabilities of Mamba, the local neighborhood relationship understanding of neighborhood attention, and the directional relationship understanding of coordinate attention. The resulting FER-YOLO-NCAMamba model, when applied to two unaligned FER benchmark datasets, RAF-DB and SFEW, obtains significantly improved mean average precision (mAP) scores when compared with those obtained by other state-of-the-art methods. Moreover, in ablation studies, it is found that the NCA module is relatively more important than the Visual State Space (VSS), a version of using Mamba for image processing, and in visualization studies using the grad-CAM method, it reveals that regions around the nose tip are critical to recognizing the expression; if it is too large, it may lead to erroneous prediction, while a small focused region would lead to correct recognition; this may explain why FER of unaligned faces is such a challenging problem. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. BGF-YOLOv10: Small Object Detection Algorithm from Unmanned Aerial Vehicle Perspective Based on Improved YOLOv10.

Author: Mei, Junhui and Zhu, Wenqiu
Subjects: *OBJECT recognition (Computer vision), *DRONE aircraft, *DEEP learning, *SPATIAL resolution, *ALGORITHMS
Abstract: With the rapid development of deep learning, unmanned aerial vehicles (UAVs) have acquired intelligent perception capabilities, demonstrating efficient data collection across various fields. In UAV perspective scenarios, captured images often contain small and unevenly distributed objects, and are typically high-resolution. This makes object detection in UAV imagery more challenging compared to conventional detection tasks. To address this issue, we propose a lightweight object detection algorithm, BGF-YOLOv10, specifically designed for small object detection, based on an improved version of YOLOv10n. First, we introduce a novel YOLOv10 architecture tailored for small objects, incorporating BoTNet, variants of C2f and C3 in the backbone, along with an additional small object detection head, to enhance detection performance for small objects. Second, we embed GhostConv into both the backbone and head, effectively reducing the number of parameters by nearly half. Finally, we insert a Patch Expanding Layer module in the neck to restore the feature spatial resolution. Experimental results on the VisDrone-DET2019 and UAVDT datasets demonstrate that our method significantly improves detection accuracy compared to YOLO series networks. Moreover, when compared to other state-of-the-art networks, our approach achieves a substantial reduction in the number of parameters. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Dexterous Manipulation Based on Object Recognition and Accurate Pose Estimation Using RGB-D Data.

Author: Manawadu, Udaka A. and Keitaro, Naruse
Subjects: *OBJECT recognition (Computer vision), *OBJECT manipulation, *RELIABILITY in engineering, *POINT cloud, *STATISTICAL sampling
Abstract: This study presents an integrated system for object recognition, six-degrees-of-freedom pose estimation, and dexterous manipulation using a JACO robotic arm with an Intel RealSense D435 camera. This system is designed to automate the manipulation of industrial valves by capturing point clouds (PCs) from multiple perspectives to improve the accuracy of pose estimation. The object recognition module includes scene segmentation, geometric primitives recognition, model recognition, and a color-based clustering and integration approach enhanced by a dynamic cluster merging algorithm. Pose estimation is achieved using the random sample consensus algorithm, which predicts position and orientation. The system was tested within a 60° field of view, which extended in all directions in front of the object. The experimental results show that the system performs reliably within acceptable error thresholds for both position and orientation when the objects are within a ±15° range of the camera's direct view. However, errors increased with more extreme object orientations and distances, particularly when estimating the orientation of ball valves. A zone-based dexterous manipulation strategy was developed to overcome these challenges, where the system adjusts the camera position for optimal conditions. This approach mitigates larger errors in difficult scenarios, enhancing overall system reliability. The key contributions of this research include a novel method for improving object recognition and pose estimation, a technique for increasing the accuracy of pose estimation, and the development of a robot motion model for dexterous manipulation in industrial settings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Road Defect Identification and Location Method Based on an Improved ML-YOLO Algorithm.

Author: Li, Tianwen and Li, Gongquan
Subjects: *OBJECT recognition (Computer vision), *TRAFFIC accidents, *FEATURE extraction, *SERVICE life, *PAVEMENTS
Abstract: The conventional method for detecting road defects relies heavily on manual inspections, which are often inefficient and struggle with precise defect localization. This paper introduces a novel approach for identifying and locating road defects based on an enhanced ML-YOLO algorithm. By refining the YOLOv8 object detection framework, we optimize both the traditional convolutional layers and the spatial pyramid pooling network. Additionally, we incorporate the Convolutional Block Attention to effectively capture channel and spatial features, along with the Selective Kernel Networks that dynamically adapt to feature extraction across varying scales. An optimized target localization algorithm is proposed to achieve high-precision identification and accurate positioning of road defects. Experimental results indicate that the detection accuracy of the improved ML-YOLO algorithm reaches 0.841, with a recall rate of 0.745 and an average precision of 0.817. Compared to the baseline YOLOv8 model, there is an increase in accuracy by 0.13, a rise in recall rate by 0.117, and an enhancement in average precision by 0.116. After the high detection accuracy of road defects was confirmed, generalization experiments were carried out on the improved ML-YOLO model in the public data set. The experimental results showed that compared with the original YOLOv8n, the average precision and recall rate of all types of ML-YOLO increased by 0.075, 0.121, and 0.035 respectively, indicating robust generalization capabilities. When applied to real-time road monitoring scenarios, this algorithm facilitates precise detection and localization of defects while significantly mitigating traffic accident risks and extending roadway service life. A high detection accuracy of road defects was achieved. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Effect of Architecture and Inference Parameters of Artificial Neural Network Models in the Detection Task on Energy Demand.

Author: Tomiło, Paweł, Oleszczuk, Piotr, Laskowska, Agnieszka, Wilczewska, Weronika, and Gnapowski, Ernest
Subjects: *ARTIFICIAL neural networks, *OBJECT recognition (Computer vision), *ENERGY consumption, *BIG data, *GRAPHICS processing units
Abstract: Artificial neural network models for the task of detection are used in many fields and find various applications. Models of this kind require adequate computational resources and thus require adequate energy expenditure. The increase in the number of parameters, the complexity of architectures, and the need to process large data sets significantly increase energy consumption, which is becoming a key sustainability challenge. Optimization of computing and the development of energy-efficient hardware technologies are essential to reduce the energy footprint of these models. This article examines the effect of the type of model, as well as its parameters, on energy consumption during inference. For this purpose, sensors built into the graphics card were used, and software was developed to measure the energy demand of the graphics card for different architectures of YOLO models (v8, v9, v10), as well as for different batch and model sizes. This study showed that the increase in energy demand is not linearly dependent on batch size. After a certain level of batch size, the energy demand begins to decrease. This dependence does not occur only for n/t size models. Optimum utilization of computing power due to the number of processed images for the studied models occurs at the maximum studied batch size. In addition, tests were conducted on an embedded device. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Energy Optimization in Ultrasound Tomography Through Sensor Reduction Supported by Machine Learning Algorithms.

Author: Baran, Bartłomiej, Rymarczyk, Tomasz, Majerek, Dariusz, Szyszka, Piotr, Wójcik, Dariusz, Cieplak, Tomasz, Gąsior, Marcin, Marczuk, Marcin, Wąsik, Edmund, and Gauda, Konrad
Subjects: *MACHINE learning, *OBJECT recognition (Computer vision), *RESOURCE-limited settings, *ENERGY consumption, *ENERGY research
Abstract: This paper focuses on reducing energy consumption in ultrasound tomography by utilizing machine learning techniques. The core idea is to investigate the feasibility of minimizing the number of measurement sensors without sacrificing prediction accuracy. This article evaluates the quality of reconstructions derived from data collected through two or three measurement channels. In subsequent steps, machine learning models are developed to predict the number, location, and size of the objects. A reliable object detection method is introduced, requiring less information than traditional signal analysis from multiple channels. Various machine learning models were tested and compared to validate the approach, with most demonstrating high accuracy or R 2 scores in their respective tasks. By reducing the number of sensors, the goal is to lower energy usage while maintaining high precision in localization. This study contributes to the ongoing research on energy efficiency in sensing and localization, especially in environments where resource optimization is crucial, such as remote or resource-limited settings. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. What Is Beyond Hyperbola Detection and Characterization in Ground-Penetrating Radar Data?—Implications from the Archaeological Site of Goting, Germany.

Author: Wunderlich, Tina, Majchczack, Bente S., Wilken, Dennis, Segschneider, Martin, and Rabbel, Wolfgang
Subjects: *OBJECT recognition (Computer vision), *GROUND penetrating radar, *SOIL moisture, *DEEP learning, *ARCHAEOLOGICAL excavations
Abstract: Hyperbolae in radargrams are caused by a variety of small subsurface objects. The analysis of their curvature enables the determination of propagation velocity in the subsurface, which is important for exact time-to-depth conversion and migration and also yields information on the water content of the soil. Using deep learning methods and fitting (DLF) algorithms, it is possible to automatically detect and analyze large numbers of hyperbola in 3D Ground-Penetrating Radar (GPR) datasets. As a result, a 3D velocity model can be established. Combining the hyperbola locations and the 3D velocity model with reflection depth sections and timeslices leads to improved archaeological interpretation due to (1) correct time-to-depth conversion through migration with the 3D velocity model, (2) creation of depthslices following the topography, (3) evaluation of the spatial distribution of hyperbolae, and (4) derivation of a 3D water content model of the site. In an exemplary study, we applied DLF to a 3D GPR dataset from the multi-phased (2nd to 12th century CE) archaeological site of Goting on the island of Föhr, Northern Germany. Using RetinaNet, we detected 38,490 hyperbolae in an area of 1.76 ha and created a 3D velocity model. The velocities ranged from approximately 0.12 m/ns at the surface to 0.07 m/ns at approx. 3 m depth in the vertical direction; in the lateral direction, the maximum velocity variation was ±0.048 m/ns. The 2D-migrated radargrams and subsequently created depthslices revealed the remains of a longhouse, which was not known beforehand and had not been visible in the unmigrated timeslices. We found hyperbola apex points aligned along linear strong reflections. They can be interpreted as stones contained in ditch fills. The hyperbola points help to differentiate between ditches and processing artifacts that have a similar appearance as the ditches in time-/depthslices. From the derived 3D water content model, we could identify the thickness of the archaeologically relevant layer across the whole site. The layer contains a lot of humus and has a high water retention capability, leading to a higher water content compared to the underlying glacial moraine sand, which is well-drained. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. A Heatmap-Supplemented R-CNN Trained Using an Inflated IoU for Small Object Detection.

Author: Butler, Justin and Leung, Henry
Subjects: *OBJECT recognition (Computer vision), *CONVOLUTIONAL neural networks, *DRONE aircraft, *REMOTE sensing, *CYCLING training
Abstract: Object detection architectures struggle to detect small objects across applications including remote sensing and autonomous vehicles. Specifically, for unmanned aerial vehicles, poor detection of small objects directly limits this technology's applicability. Objects both appear smaller than they are in large-scale images captured in aerial imagery and are represented by reduced information in high-altitude imagery. This paper presents a new architecture, CR-CNN, which predicts independent regions of interest from two unique prediction branches within the first stage of the network: a conventional R-CNN convolutional backbone and an hourglass backbone. Utilizing two independent sources within the first stage, our approach leads to an increase in successful predictions of regions that contain smaller objects. Anchor-based methods such as R-CNNs also utilize less than half the number of small objects compared to larger ones during training due to the poor intersection over union (IoU) scores between the generated anchors and the groundtruth—further reducing their performance on small objects. Therefore, we also propose artificially inflating the IoU of smaller objects during training using a simple, size-based Gaussian multiplier—leading to an increase in the quantity of small objects seen per training cycle based on an increase in the number of anchor–object pairs during training. This architecture and training strategy led to improved detection overall on two challenging aerial-based datasets heavily composed of small objects while predicting fewer false positives compared to Mask R-CNN. These results suggest that while new and unique architectures will continue to play a part in advancing the field of object detection, the training methodologies and strategies used will also play a valuable role. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Exploring Topological Information Beyond Persistent Homology to Detect Geospatial Objects.

Author: Syzdykbayev, Meirman and Karimi, Hassan A.
Subjects: *OBJECT recognition (Computer vision), *LANDSLIDE hazard analysis, *GEOSPATIAL data, *LANDSLIDES, *POLYGONS
Abstract: Accurate detection of geospatial objects, particularly landslides, is a critical challenge in geospatial data analysis due to the complex nature of the data and the significant consequences of these events. This paper introduces an innovative topological knowledge-based (Topological KB) method that leverages the integration of topological, geometrical, and contextual information to enhance the precision of landslide detection. Topology, a fundamental branch of mathematics, explores the properties of space that are preserved under continuous transformations and focuses on the qualitative aspects of space, studying features like connectivity and exitance of loops/holes. We employed persistent homology (PH) to derive candidate polygons and applied three distinct strategies for landslide detection: without any filters, with geometrical and contextual filters, and a combination of topological with geometrical and contextual filters. Our method was rigorously tested across five different study areas. The experimental results revealed that geometrical and contextual filters significantly improved detection accuracy, with the highest F1 scores achieved when employing these filters on candidate polygons derived from PH. Contrary to our initial hypothesis, the addition of topological information to the detection process did not yield a notable increase in accuracy, suggesting that the initial topological features extracted through PH suffices for accurate landslide characterization. This study advances the field of geospatial object detection by demonstrating the effectiveness of combining geometrical and contextual information and provides a robust framework for accurately mapping landslide susceptibility. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. CLOUDSPAM: Contrastive Learning On Unlabeled Data for Segmentation and Pre-Training Using Aggregated Point Clouds and MoCo.

Author: Mahmoudi Kouhi, Reza, Stocker, Olivier, Giguère, Philippe, and Daniel, Sylvie
Subjects: *OBJECT recognition (Computer vision), *POINT cloud, *DIGITAL twins, *DATA augmentation, *DIGITAL mapping
Abstract: SegContrast first paved the way for contrastive learning on outdoor point clouds. Its original formulation targeted individual scans in applications like autonomous driving and object detection. However, mobile mapping purposes such as digital twin cities and urban planning require large-scale dense datasets to capture the full complexity and diversity present in outdoor environments. In this paper, the SegContrast method is revisited and adapted to overcome its limitations associated with mobile mapping datasets, namely the scarcity of contrastive pairs and memory constraints. To overcome the scarcity of contrastive pairs, we propose the merging of heterogeneous datasets. However, this merging is not a straightforward procedure due to the variety of size and number of points in the point clouds of these datasets. Therefore, a data augmentation approach is designed to create a vast number of segments while optimizing the size of the point cloud samples to the allocated memory. This methodology, called CLOUDSPAM, guarantees the performance of the self-supervised model for both small- and large-scale mobile mapping point clouds. Overall, the results demonstrate the benefits of utilizing datasets with a wide range of densities and class diversity. CLOUDSPAM matched the state of the art on the KITTI-360 dataset, with a 63.6% mIoU, and came in second place on the Toronto-3D dataset. Finally, CLOUDSPAM achieved competitive results against its fully supervised counterpart with only 10% of labeled data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. FS-3DSSN: an efficient few-shot learning for single-stage 3D object detection on point clouds.

Author: Tiwari, Alok Kumar and Sharma, G. K.
Subjects: *OBJECT recognition (Computer vision), *POLICE vehicles, *POINT cloud, *DETECTORS, *LEARNING modules
Abstract: The current 3D object detection methods have achieved promising results for conventional tasks to detect frequently occurring objects like cars, pedestrians and cyclists. However, they require many annotated boundary boxes and class labels for training, which is very expensive and hard to obtain. Nevertheless, detecting infrequent occurring objects, such as police vehicles, is also essential for autonomous driving to be successful. Therefore, we explore the potential of few-shot learning to handle this challenge of detecting infrequent categories. The current 3D object detectors do not have the necessary architecture to support this type of learning. Thus, this paper presents a new method termed few-shot single-stage network for 3D object detection (FS-3DSSN) to predict infrequent categories of objects. FS-3DSSN uses a class-incremental few-shot learning approach to detect infrequent categories without compromising the detection accuracy of frequent categories. It consists of two modules: (i) a single-stage network architecture for 3D object detection (3DSSN) using deformable convolutions to detect small objects and (ii) a class-incremental-based meta-learning module to learn and predict infrequent class categories. 3DSSN obtained 84.53 mAP 3D on the KITTI car category and 73.4 NDS on the nuScenes dataset, outperforming previous state of the art. Further, the result of FS-3DSSN on nuScenes is also encouraging for detecting infrequent categories while maintaining accuracy in frequent classes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Decouple and align classification and regression in one-stage object detection.

Author: Fang, Zhaoyan, Chen, Niannian, Jiang, Yong, and Fan, Yong
Subjects: *OBJECT recognition (Computer vision), *DEEP learning, *PROBLEM solving, *DETECTORS, *CLASSIFICATION
Abstract: Current one-stage object detection methods use dense prediction to generate classification and regression results at the same point on the feature map. Due to the different task attributes, classification and regression are typically trained using separate detection heads, which may result in different feature areas being focused on. However, they ultimately act on the same object, especially in the post-processing stage, where we hope they have the same performance. This inherent contradiction can seriously affect the performance of the detector. To solve this problem, we propose a flexible and effective decouple and align classification and regression one-stage object detector (DAOD), based on different aspects to decouple and align the two subtasks. Specifically, we first propose a regression subtask spatial decouple module to solve the regression spatial sensitivity problem by efficiently sampling the information of the regression result map to strengthen localization. Then, we propose a dynamic aligned label assignment strategy for sample selection, guiding the network to focus on more aligned features during training. Finally, we introduce harmonic supervision to align results while ensuring the independence of the respective task. With the negligible additional overhead, extensive experiments on the COCO dataset demonstrate the effectiveness of our DAOD. Notably, DAOD with ResNeXt-101-64 × 4d-DCN backbone achieves 50.0 AP at single-model single-scale testing on MS-COCO test-dev. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. LENet: Lightweight and Effective Detector for Aerial Object.

Author: Zhou, Xunkuai, Li, Li, and Chen, Ben M.
Subjects: *ARTIFICIAL intelligence, *OBJECT recognition (Computer vision), *PATTERN recognition systems, *CONVOLUTIONAL neural networks, *FEATURE extraction, *DEEP learning, *OBJECT tracking (Computer vision), *TRACKING algorithms
Published: 2024
Full Text: View/download PDF

46. The Design and Implementation of the Vision-based Water Surface Garbage Cleaning Vehicle.

Author: Jiang, Zeyu and Wang, Qing
Subjects: *REFUSE collection, *PATTERN recognition systems, *OBJECT recognition (Computer vision), *SLIDING mode control, *WASTE paper, *DEEP learning, *ARTIFICIAL satellite attitude control systems
Published: 2024
Full Text: View/download PDF

47. Coffee Green Bean Defect Detection Method Based on an Improved YOLOv8 Model.

Author: Ji, Yuanhao, Xu, Jinpu, Yan, Beibei, and Sridhar, Kandi
Subjects: *OBJECT recognition (Computer vision), *GREEN bean, *FEATURE extraction, *MARKET value, *COFFEE beans, *COFFEE
Abstract: This research is aimed at addressing the significant challenges of detecting and classifying green coffee beans, with a particular focus on identifying defective coffee beans—an important task for improving coffee quality and market value. The main challenge is to accurately detect small and visually subtle defects in coffee beans in real‐world production environments with a large number of beans, varying lighting conditions, and complex backgrounds. To address these challenges, we propose a YOLOv8n‐based object detection model that employs several innovative strategies aimed at improving detection performance and robustness. Our research includes the introduction of WIoUv3 and the development of the Atn‐C3Ghost module, which integrates the ECA mechanism with the C3Ghost module to refine the feature extraction and improve the accuracy of the model. In order to validate the effectiveness of our proposed method, we conducted comprehensive comparison and ablation experiments. In addition, we compared the C3Ghost structure in combination with various attentional mechanisms to determine their impact on the model's detection ability. We also conducted ablation studies to evaluate the respective contributions of WIoUv3, ECA, and C3Ghost to overall model performance. The experimental results show that the YOLOv8n‐based model enhanced with WIoUv3, ECA, and C3Ghost achieves an accuracy of 99.0% in detecting green coffee beans, which is significantly better than other YOLO models. This study not only provides a practical solution for green coffee bean detection but also provides a valuable framework for addressing similar challenges in other small object detection tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Crack detection and dimensional assessment using smartphone sensors and deep learning.

Author: Tello-Gil, Carlos, Jabari, Shabnam, Waugh, Lloyd, Masry, Mark, and McGinn, Jared
Subjects: *OBJECT recognition (Computer vision), *INFRASTRUCTURE (Economics), *SENSOR placement, *CRACKING of concrete, *POSITION sensors, *DEEP learning
Abstract: This paper addresses the crucial need for effective crack detection and dimensional assessment in civil infrastructure materials to ensure safety and functionality. It proposes a cost-effective crack detection and dimensional assessment solution by applying state-of-the-art deep learning on smartphone sensor imagery and positioning data. The proposed methodology integrates three-dimensional (3D) data from LiDAR sensors with Mask R-convolutional neural network (CNN) and YOLOv8 object detection networks, for automated crack detection in concrete structures, allowing for accurate measurement of crack dimensions, including length, width, and area. The study finds that YOLOv8 produces superior precision and recall results in crack detection compared to Mask R-CNN. Furthermore, the calculated crack-straight-length closely aligns with the ground-truth straight-length, with an average error of 1.5%. These research contributions include developing a multi-modal solution combining LiDAR observations with image masks for precise 3D crack measurements, establishing a dimensional assessment pipeline to convert segmented cracks into measurements, and comparing state-of-the-art CNN-based networks for crack detection in real-life images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. OV-DAR: Open-Vocabulary Object Detection and Attributes Recognition.

Author: Chen, Keyan, Jiang, Xiaolong, Wang, Haochen, Yan, Cilin, Gao, Yan, Tang, Xu, Hu, Yao, and Xie, Weidi
Subjects: *OBJECT recognition (Computer vision), *GENERALIZATION, *CLASSIFICATION, *FORECASTING, *ANCHORS
Abstract: In this paper, we endeavor to localize all potential objects in an image and infer their visual categories, attributes, and shapes, even in instances where certain objects have not been encompassed in the model's supervised training. This is similar to the challenge posed by open-vocabulary object detection and recognition. The proposed OV-DAR framework, in contrast to previous object detection and recognition frameworks, offers superior advantages and performance in terms of generalization, universality, and granularity expression. Specifically, OV-DAR disentangles the open-vocabulary object detection and recognition problem into two components: class-agnostic object proposal and open-vocabulary classification. It employs co-training to maintain a balance between the performance of these two components. For the former, we construct class-agnostic object proposal networks based on the anchor/query with the SAM foundation model, which demonstrates robust generalization in object proposing and masking. For the latter, we merge available object-centered category classification and attribute prediction data, take co-learning for efficient fine-tuning of CLIP, and subsequently augment the open-vocabulary capability on object-centered category/attribute prediction tasks using freely accessible online image–text pairs. To ensure the efficiency and accuracy of open-vocabulary classification, we devise a structure akin to Faster R-CNN and fully exploit the knowledge of object-centered CLIP for end-to-end multi-object open-vocabulary category and attribute prediction by knowledge distillation. We conduct comprehensive experiments on VAW, MS-COCO, LSA, and OVAD datasets. The results not only illustrate the complementarity of semantic category and attribute recognition for visual scene understanding but also underscore the generalization capability of OV-DAR in localizing, categorizing, attributing, and masking tasks and open-world scene perception. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. ViDSOD-100: A New Dataset and a Baseline Model for RGB-D Video Salient Object Detection.

Author: Lin, Junhao, Zhu, Lei, Shen, Jiaxing, Fu, Huazhu, Zhang, Qing, and Wang, Liansheng
Subjects: *OBJECT recognition (Computer vision), *DATA release, *VIDEOS, *DETECTORS, *ANNOTATIONS
Abstract: With the rapid development of depth sensor, more and more RGB-D videos could be obtained. Identifying the foreground in RGB-D videos is a fundamental and important task. However, the existing salient object detection (SOD) works only focus on either static RGB-D images or RGB videos, ignoring the collaborating of RGB-D and video information. In this paper, we first collect a new annotated RGB-D video SOD (ViDSOD-100) dataset, which contains 100 videos within a total of 9362 frames, acquired from diverse natural scenes. All the frames in each video are manually annotated to a high-quality saliency annotation. Moreover, we propose a new baseline model, named attentive triple-fusion network (ATF-Net), for RGB-D video salient object detection. Our method aggregates the appearance information from an input RGB image, spatio-temporal information from an estimated motion map, and the geometry information from the depth map by devising three modality-specific branches and a multi-modality integration branch. The modality-specific branches extract the representation of different inputs, while the multi-modality integration branch combines the multi-level modality-specific features by introducing the encoder feature aggregation (MEA) modules and decoder feature aggregation (MDA) modules. The experimental findings conducted on both our newly introduced ViDSOD-100 dataset and the well-established DAVSOD dataset highlight the superior performance of the proposed ATF-Net.This performance enhancement is demonstrated both quantitatively and qualitatively, surpassing the capabilities of current state-of-the-art techniques across various domains, including RGB-D saliency detection, video saliency detection, and video object segmentation. We shall release our data, our results, and our code upon the publication of this work. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

16,077 results on '"OBJECT recognition (Computer vision)"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources