Descriptor: "YOLO" / Language: english - Searchworks@Jio Institute Digital Library Search Results

1. Deep learning assisted real-time object recognition and depth estimation for enhancing emergency response in adaptive environment

Author: Faseeh, Muhammad, Bibi, Misbah, Khan, Murad Ali, and Kim, Do-Hyeun
Published: 2024
Full Text: View/download PDF

2. Rapid adaptation in photovoltaic defect detection: Integrating CLIP with YOLOv8n for efficient learning

Author: Saeed, Fahman, Aldera, Sultan, Al-Shamma’a, Abdullrahman A., and Hussein Farh, Hassan M.
Published: 2024
Full Text: View/download PDF

3. Detection of pine wilt disease infected pine trees using YOLOv5 optimized by attention mechanisms and loss functions

Author: Dong, Xiaotong, Zhang, Li, Xu, Chang, Miao, Qing, Yao, Junsheng, Liu, Fangchao, Liu, Huiwen, Lu, Ying-Bo, Kang, Ran, and Song, Bin
Published: 2024
Full Text: View/download PDF

4. YOLO-RCS: A method for detecting phenological period of 'Yuluxiang' pear in unstructured environment

Author: Ren, Rui, Zhang, Shujuan, Sun, Haixia, Wang, Ning, Yang, Sheng, Zhao, Huamin, and Xin, Mingming
Published: 2025
Full Text: View/download PDF

5. Selective fruit harvesting prediction and 6D pose estimation based on YOLOv7 multi-parameter recognition

Author: Zhao, Guorui, Dong, Shi, Wen, Jian, Ban, Yichen, and Zhang, Xiaowei
Published: 2025
Full Text: View/download PDF

6. Circle-YOLO: An anchor-free lung nodule detection algorithm using bounding circle representation

Author: Tang, Chaosheng, Zhou, Feifei, Sun, Junding, and Zhang, Yudong
Published: 2025
Full Text: View/download PDF

8. A Real-Time Posture Detection Algorithm Based on Deep Learning.

Author: Jiang, Yujie, Hang, Rongzhi, Huang, Weipeng, Wu, Yanhao, Pan, Xiaoping, and Tao, Zhi
Abstract: With the development of machine vision and multimedia technology, posture detection and related algorithms have become widely used in the field of human posture recognition. Traditional video surveillance methods have the disadvantages of slow detection speed, low accuracy, interference from occlusions, and poor real-time performance. This paper proposes a real-time pose detection algorithm based on deep learning, which can effectively perform real-time tracking and detection of single and multiple individuals in different indoor and outdoor environments and at different distances. First, a corresponding pose recognition dataset for complex scenes was created based on the YOLO network. Then, the OpenPose method was used to detect key points of the human body. Finally, the Kalman filter multi-object tracking method was used to predict the state of human targets within the occluded area. Real-time detection of human postures (sitting, stand up, standing, sit down, walking, fall down, and lying down) is achieved with corresponding alarms to ensure the timely detection and processing of emergencies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Comparative analysis of stomatal pore instance segmentation: Mask R-CNN vs. YOLOv8 on Phenomics Stomatal dataset.

Author: Thai, Thanh Tuan, Ku, Ki-Bon, Le, Anh Tuan, Oh, San Su Min, Phan, Ngo Hoang, Kim, In-Jung, and Chung, Yong Suk
Abstract: This study conducts a rigorous comparative analysis between two cutting-edge instance segmentation methods, Mask R-CNN and YOLOv8, focusing on stomata pore analysis. A novel dataset specifically tailored for stomata pore instance segmentation, named PhenomicsStomata, was introduced. This dataset posed challenges such as low resolution and image imperfections, prompting the application of advanced preprocessing techniques, including image enhancement using the Lucy-Richardson Algorithm. The models underwent comprehensive evaluation, considering accuracy, precision, and recall as key parameters. Notably, YOLOv8 demonstrated superior performance over Mask R-CNN, particularly in accurately calculating stomata pore dimensions. Beyond this comparative study, the implications of our findings extend across diverse biological research, providing a robust foundation for advancing our understanding of plant physiology. Furthermore, the preprocessing enhancements offer valuable insights for refining image analysis techniques, showcasing the potential for broader applications in scientific domains. This research marks a significant stride in unraveling the complexities of plant structures, offering both theoretical insights and practical applications in scientific research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Real-time Vehicle Recognition System Using You Only Look Once Model with Uniform Experimental Design.

Author: Chun-Hui Lin, We-Ling Lin, Cheng-Jian Lin, and Kang-Wei Lee
Abstract: As urban areas develop and technology advances, artificial intelligence technologies offer numerous applications to alleviate a pressing issue: traffic congestion. The importance of traffic management and safety monitoring underscores the crucial role of integrating vehicle recognition technology with the Internet of Things within intelligent transportation systems. In this study, You Only Look Once with Uniform Experimental Design (U-YOLOv4) is proposed to enhance the performance of vehicle recognition. The approach aims to optimize hyperparameters within YOLOv4, resulting in a high recognition rate in the model. Furthermore, two datasets were utilized: Vehicle from Beijing Institute of Technology (BIT) and Computational Intelligence Application Laboratory from National Chin-Yi University of Technology (CIA-NCUT). The experimental results revealed significant improvements when comparing U-YOLOv4 to YOLOv4. In the BIT-Vehicle dataset, U-YOLOv4 achieved a mean average precision of 97.84%, whereas in the CIA-NCUT dataset, it reached 89.19%, highlighting its superior performance over YOLOv4. The U-YOLOv4 model has overall demonstrated significant improvement in vehicle recognition, revealing its adaptability across different datasets and various scenarios. Its application is expected to play a crucial role in intelligent transportation systems, enhancing traffic management efficiency and road safety. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Mobile-YOLO-SDD: A Lightweight YOLO for Real-time Steel Defect Detection.

Author: Luo, Shen, Xu, Yuanping, Zhu, Ming, Zhang, Chaolong, Kong, Chao, Jin, Jin, Li, Tukun, Jiang, Xiangqian, and Guo, Benjun
Abstract: Defect detection is essential in the steel production process. Recent years have seen significant advancements in steel surface defect detection based on deep learning methods, notably exemplified by the YOLO series models capable of precise and rapid detection. However, challenges arise due to the high complexity of surface textures on steel and the low recognition rates for minor defects, making real-time and accurate detection difficult. This study introduces Mobile-YOLO-SDD (Steel Defect Detection), a lightweight YOLO-based model designed with high accuracy for real-time steel defect detection. Firstly, based on the effective YOLOv5 algorithm for steel defect detection, the backbone network is replaced with MobileNetV2 to reduce the model size and computational complexity. Then, the ECA (Efficient Channel Attention) module was integrated into the C3 module to reduce the number of parameters further while maintaining the defect detection rate in complex backgrounds. Finally, the K-Means++ algorithm regenerates anchor boxes and determines optimal sizes, enhancing their adaptability to actual targets. Experimental results on NEU-DET data demonstrate that the improved algorithm achieves a 60.6% reduction in model size, a 60.8% reduction in FLOPs, and a 1.8% improvement in mAP compared to YOLOv5s. These results confirm the effectiveness of Mobile-YOLO-SDD and lay the foundation for subsequent lightweight deployment of steel defect detection models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Edge detective weights initialization on Darknet-19 model for YOLOv2-based facemask detection.

Author: Ningthoujam, Richard, Pritamdas, Keisham, and Singh, Loitongbam Surajkumar
Subjects: *OBJECT recognition (Computer vision), *DEEP learning, *ALGORITHMS, *DETECTIVES
Abstract: The object detection model based on the transfer learning approach comprises feature extraction and detection layers. YOLOv2 is among the fastest detection algorithms, which can utilize various pretrained classifier networks for feature extraction. However, reducing the number of network layers and increasing the mean average precision (mAP) together have challenges. Darknet-19-based YOLOv2 model achieved an mAP of 76.78% by having a smaller number of layers than other existing models. This work proposes modification by adding layers that help enhance feature extraction for further increasing the mAP of the model. Above that, the initial weights of the new layers can be random or deterministic, fine-tuned during training. In our work, we introduce a block of layers initialized with deterministic weights derived from several edge detection filter weights. Integrating such a block to the darknet-19-based object detection model improves the mAP to 85.94%, outperforming the other existing model in terms of mAP and number of layers. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Research on Object Detection in Complex Scenarios Based on ASA‐YOLOv5.

Author: Lin, Shoujun, Deng, Lixia, Zhang, Hongyu, Bi, Lingyun, Dong, Jinshun, Wan, Dapeng, Liu, Haiying, and Liu, Lida
Subjects: *PUBLIC transit, *HAZARDOUS substances, *PUBLIC safety, *ELECTRICAL engineers, *PROBLEM solving
Abstract: The applications of target detection in complex scenarios cover a wide range of fields, such as pedestrian and vehicle detection in self‐driving cars, face recognition and abnormal behavior detection in security monitoring systems, hazardous materials safety detection in public transportation, and so on. These applications demonstrate the importance and the prospect of wide application of target detection techniques in solving practical problems in complex scenarios. However, in these real scenes, there are often problems such as mutual occlusion and scale change. Therefore, how to accurately identify the target in the real complex scenarios has become a big problem to be solved. In order to solve the above problem, the paper proposes a novel algorithm, Adaptive Self‐Attention‐YOLOv5 (ASA‐YOLOv5), which is built upon the YOLOv5s algorithm and demonstrates effectiveness for target identification in complex scenarios. First, the paper implements a fusion mechanism between the trunk and neck networks, enabling the fusion of features across different levels through upsampling and downsampling. This fusion process mitigates detection errors caused by feature loss. Second, the Shuffle Attention mechanism is introduced before upsampling and downsampling to suppress noise and amplify essential semantic information, further enhancing target identification accuracy. Lastly, the Adaptively Spatial Feature Fusion (ASFF) module and Receptive Field Blocks (RFBs) module are added in the head network, and it can improve feature scale invariance and expand the receptive field. The ability of the model to detect the target in the complex scene is improved effectively. Experimental results indicate a notable improvement in the model's mean Average Precision (mAP) by 2.1% on the COCO dataset and 0.7% on the SIXray dataset. The proposed ASA‐YOLOv5 algorithm can enhance the effectiveness for target detection in complex scenarios, and it can be widely used in real‐world settings. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. In-depth review of AI-enabled unmanned aerial vehicles: trends, vision, and challenges.

Author: Pal, Osim Kumar, Shovon, MD Sakib Hossain, Mridha, M. F., and Shin, Jungpil
Subjects: GENERATIVE artificial intelligence, AGRICULTURAL drones, OBJECT recognition (Computer vision), DRONE aircraft, WILDLIFE monitoring
Abstract: In recent times, AI and UAV have progressed significantly in several applications. This article analyzes applications of UAV with modern green computing in various sectors. It addresses cutting-edge technologies such as green computing, generative AI, future scope, and related concerns in UAV. The research investigates the role of green computing and generative AI in combination with UAVs for navigation, object recognition and tracking, wildlife monitoring, precision agriculture, rescue operations, surveillance, and UAV communication. This study examines how modern computing technologies and UAVs are being applied in agriculture, surveillance, disaster management, and other areas. The ethics of UAV and AI applications, including safety, legal frameworks, and other issues, are thoroughly investigated. This research examines AI-based UAV applications across different disciplines, using open-source data and current advancements for future growth in this domain. This investigation will aid future researchers in their exploration of UAVs using cutting-edge computing technologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Deep caries detection using deep learning: from dataset acquisition to detection.

Author: Kaur, Amandeep, Jyoti, Divya, Sharma, Ankit, Yelam, Dhiraj, Goyal, Rajni, and Nath, Amar
Abstract: Objectives: The study aims to address the global burden of dental caries, a highly prevalent disease affecting billions of individuals, including both children and adults. Recognizing the significant health challenges posed by untreated dental caries, particularly in low- and middle-income countries, our goal is to improve early-stage detection. Though effective, traditional diagnostic methods, such as bitewing radiography, have limitations in detecting early lesions. By leveraging Artificial Intelligence (AI), we aim to enhance the accuracy and efficiency of caries detection, offering a transformative approach to dental diagnostics. Materials and Methods: This study proposes a novel deep learning-based approach for dental caries detection using the latest models, i.e., YOLOv7, YOLOv8, and YOLOv9. Trained on a dataset of over 3,200 images, the models address the shortcomings of existing detection methods and provide an automated solution to improve diagnostic accuracy. Results: The YOLOv7 model achieved a mean Average Precision (mAP) at 0.5 Intersection over Union (IoU) of 0.721, while YOLOv9 attained a mAP@50 IoU of 0.832. Notably, YOLOv8 outperformed both, with a mAP@0.5 of 0.982. This demonstrates robust detection capabilities across multiple categories, including caries,” “Deep Caries,” and “Exclusion.” Conclusions: This high level of accuracy and efficiency highlights the potential of integrating AI-driven systems into clinical workflows, improving diagnostic capabilities, reducing healthcare costs, and contributing to better patient outcomes, especially in resource-constrained environments. Clinical Relevance: Integrating these latest YOLO advanced AI models into dental diagnostics could transform the landscape of caries detection. Enhancing early-stage diagnosis accuracy can lead to more precise and cost-effective treatment strategies, with significant implications for improving patient outcomes, particularly in low-resource settings where traditional diagnostic capabilities are often limited [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Enhanced deep leaning model for detection and grading of lumbar disc herniation from MRI.

Author: Duan, Xianyin, Xiong, Hanlin, Liu, Rong, Duan, Xianbao, and Yu, Haotian
Subjects: *MAGNETIC resonance imaging, *RANGE of motion of joints, *DEEP learning, *FEATURE extraction, *JOB qualifications
Abstract: Lumbar disc herniation is one of the most prevalent orthopedic issues in clinical practice. The lumbar spine is a crucial joint for movement and weight-bearing, so back pain can significantly impact the everyday lives of patients and is prone to recurring. The pathogenesis of lumbar disc herniation is complex and diverse, making it difficult to identify and assess after it has occurred. Magnetic resonance imaging (MRI) is the most effective method for detecting injuries, requiring continuous examination by medical experts to determine the extent of the injury. However, the continuous examination process is time-consuming and susceptible to errors. This study proposes an enhanced model, BE-YOLOv5, for hierarchical detection of lumbar disc herniation from MRI images. To tailor the training of the model to the job requirements, a specialized dataset was created. The data was cleaned and improved before the final calibration. A final training set of 2083 data points and a test set of 100 data points were obtained. The YOLOv5 model was enhanced by integrating the attention mechanism module, ECAnet, with a 3 × 3 convolutional kernel size, substituting its feature extraction network with a BiFPN, and implementing structural system pruning. The model achieved an 89.7% mean average precision (mAP) and 48.7 frames per second (FPS) on the test set. In comparison to Faster R-CNN, original YOLOv5, and the latest YOLOv8, this model performs better in terms of both accuracy and speed for the detection and grading of lumbar disc herniation from MRI, validating the effectiveness of multiple enhancement methods. The proposed model is expected to be used for diagnosing lumbar disc herniation from MRI images and to demonstrate efficient and high-precision performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. A computer vision system for apple fruit sizing by means of low-cost depth camera and neural network application.

Author: Bortolotti, G., Piani, M., Gullino, M., Mengoli, D., Franceschini, C., Grappadelli, L. Corelli, and Manfrini, L.
Subjects: *ORCHARD management, *APPLE orchards, *COMPUTER vision, *IMAGE analysis, *FARMERS, *ORCHARDS
Abstract: Fruit size is crucial for growers as it influences consumer willingness to buy and the price of the fruit. Fruit size and growth along the seasons are two parameters that can lead to more precise orchard management favoring production sustainability. In this study, a Python-based computer vision system (CVS) for sizing apples directly on the tree was developed to ease fruit sizing tasks. The system is made of a consumer-grade depth camera and was tested at two distances among 17 timings throughout the season, in a Fuji apple orchard. The CVS exploited a specifically trained YOLOv5 detection algorithm, a circle detection algorithm, and a trigonometric approach based on depth information to size the fruits. Comparisons with standard-trained YOLOv5 models and with spherical objects were carried out. The algorithm showed good fruit detection and circle detection performance, with a sizing rate of 92%. Good correlations (r > 0.8) between estimated and actual fruit size were found. The sizing performance showed an overall mean error (mE) and RMSE of + 5.7 mm (9%) and 10 mm (15%). The best results of mE were always found at 1.0 m, compared to 1.5 m. Key factors for the presented methodology were: the fruit detectors customization; the HoughCircle parameters adaptability to object size, camera distance, and color; and the issue of field natural illumination. The study also highlighted the uncertainty of human operators in the reference data collection (5–6%) and the effect of random subsampling on the statistical analysis of fruit size estimation. Despite the high error values, the CVS shows potential for fruit sizing at the orchard scale. Future research will focus on improving and testing the CVS on a large scale, as well as investigating other image analysis methods and the ability to estimate fruit growth. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Comparison of YOLO and transformer based tumor detection in cystoscopy.

Author: Eixelberger, Thomas, Maisch, Philipp, Belle, Sebastian, Kriegmaier, Maximilian, Bolenz, Christian, and Wittenberg, Thomas
Subjects: BLADDER cancer treatment, BLADDER cancer diagnosis, TUMOR diagnosis, CYSTOSCOPY, GENITOURINARY disease diagnosis
Abstract: Background: Bladder cancer (BCa) is the second most common type of cancer in the genitourinary system and causes approximately 165,000 deaths each year. The diagnosis of BCa is primarily done through cystoscopy, which involves visually examining the bladder using an endoscope. Currently, white light cystoscopy is considered the most reliable method for diagnosis. However, it can be challenging to detect and diagnose flat, small, or poorly textured lesions. The study explores the performance of deep learning systems (YOLOv7- tiny, RT-DETR18), originally designed for detecting adenomas in colonoscopy images, when retrained and tested with cystoscopy images. The deep neural network used in the study was pre-trained on 35,699 colonoscopy images (some from Mannheim) and both architectures achieved a F1 score of 0.91 on publicly available colonoscopy datasets. Results: When the adenoma-detection network was tested with cystoscopy images from two sources (Ulm and Erlangen), F1 scores ranging from 0.58 to 0.81 were achieved. Subsequently, the networks were retrained using 12,066 cystoscopy images from Mannheim, resulting in improved F1 scores ranging from 0.77 to 0.85. Conclusion: It could be shown that transformer based networks perform slightly better than YOLOv7-tiny networks, but both network types are feasable for lesion detection in the human bladder. The retraining of the network with additional cystoscopy data led to an improvement in the performance of urinary lesion detection. This suggests that it is possible to achieve a domain-shift with the inclusion of appropriate additional data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Quantification of Empty Lacunae in Tissue Sections of Osteonecrosis of the Femoral Head Using YOLOv8 Artificial Intelligence Model.

Author: Shinohara, Issei, Inui, Atsuyuki, Murayama, Masatoshi, Susuki, Yosuke, Gao, Qi, Chow, Simon Kwoon‐Ho, Mifune, Yutaka, Matsumoto, Tomoyuki, Kuroda, Ryosuke, and Goodman, Stuart B.
Abstract: Histomorphometry is an important technique in the evaluation of non‐traumatic osteonecrosis of the femoral head (ONFH). Quantification of empty lacunae and pyknotic cells on histological images is the most reliable measure of ONFH pathology, yet it is time and manpower consuming. This study focused on the application of artificial intelligence (AI) technology to tissue image evaluation. The aim of this study is to establish an automated cell counting platform using YOLOv8 as an object detection model on ONFH tissue images and to evaluate and validate its accuracy. From 30 ONFH model rabbits, 270 tissue images were prepared; based on evaluations by three researchers, ground truth labels were created to classify each cell in the image into two classes (osteocytes and empty lacunae) or three classes (osteocytes, pyknotic cells, and empty lacunae). Two and three classes were then annotated on each image. Transfer learning based on annotated data (80% for training and 20% for validation) was performed using YOLOv8n and YOLOv8x with different parameters. To evaluate the detection accuracy of the training model, the mean average precision (mAP (50)) and precision‐recall curve were identified. In addition, the reliability of cell counting by YOLOv8 relative to manual cell counting was evaluated by linear regression analysis using five histological images unused in previous experiments. The mAP (50) for the detection of empty lacunae was 0.868 for the YOLOv8n and 0.883 for the YOLOv8x. The mAP (50) for the three classes was 0.735 for the YOLOv8n model and 0.750 for the YOLOv8x model. The quantification of empty lacunae by automated cell counting obtained in the learning was highly correlated with the manual counting data. The development of an AI‐applied automated cell counting platform will significantly reduce the time and effort of manual cell counting in histological analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. An Evaluation of Image Slicing and YOLO Architectures for Object Detection in UAV Images.

Author: Telçeken, Muhammed, Akgun, Devrim, and Kacar, Sezgin
Abstract: Object detection in aerial images poses significant challenges due to the high dimensions of the images, requiring efficient handling and resizing to fit object detection models. The image-slicing approach for object detection in aerial images can increase detection accuracy by eliminating pixel loss in high-resolution image data. However, determining the proper dimensions to slice is essential for the integrity of the objects and their learning by the model. This study presents an evaluation of the image-slicing approach for alternative sizes of images to optimize efficiency. For this purpose, a dataset of high-resolution images collected with Unmanned Aerial Vehicles (UAV) has been used. The experiments evaluated using alternative YOLO architectures like YOLOv7, YOLOv8, and YOLOv9 show that the image dimensions significantly change the performance results. According to the experiments, the best mAP@05 accuracy was obtained by slicing 1280 × 1280 for YOLOv7 producing 88.2. Results show that edge-related objects are better preserved as the overlap and slicing sizes increase, resulting in improved model performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. An Improved Real-Time Detection Transformer Model for the Intelligent Survey of Traffic Safety Facilities.

Author: Wan, Yan, Wang, Hui, Lu, Lingxin, Lan, Xin, Xu, Feifei, and Li, Shenglin
Abstract: The undertaking of traffic safety facility (TSF) surveys represents a significant labor-intensive endeavor, which is not sustainable in the long term. The subject of traffic safety facility recognition (TSFR) is beset with numerous challenges, including those associated with background misclassification, the diminutive dimensions of the targets, the spatial overlap of detection targets, and the failure to identify specific targets. In this study, transformer-based and YOLO (You Only Look Once) series target detection algorithms were employed to construct TSFR models to ensure both recognition accuracy and efficiency. The TSF image dataset, comprising six categories of TSFs in urban areas of three cities, was utilized for this research. The dimensions and intricacies of the Detection Transformer (DETR) family of models are considerably more substantial than those of the YOLO family. YOLO-World and Real-Time Detection Transformer (RT-DETR) models were optimal and comparable for the TSFR task, with the former exhibiting a higher detection efficiency and the latter a higher detection accuracy. The RT-DETR model exhibited a notable reduction in model complexity by 57% in comparison to the DINO (DETR with improved denoising anchor boxes for end-to-end object detection) model while also demonstrating a slight enhancement in recognition accuracy. The incorporation of the RepGFPN (Reparameterized Generalized Feature Pyramid Network) module has markedly enhanced the multi-target detection accuracy of RT-DETR, with a mean average precision (mAP) of 82.3%. The introduction of RepGFPN significantly enhanced the detection rate of traffic rods, traffic sign boards, and water surround barriers and somewhat ameliorated the problem of duplicate detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. YOLOSeaShip: a lightweight model for real-time ship detection.

Author: Jiang, Xiaoliang, Cai, Jianchen, and Wang, Ban
Abstract: With the rapid advancements in computer vision, ship detection models based on deep learning have been more and more prevalent. However, most network methods use expensive costs with high hardware equipment needed to increase detection accuracy. In response to this challenge, a lightweight real-time detection approach called YOLOSeaShip is proposed. Firstly, derived from the YOLOv7-tiny model, the partial convolution was utilized to replace the original 3×1 convolution in the ELAN module to further fewer parameters and improve the operation speed. Secondly, the parameter-free average attention module was integrated to improve the locating capacity for the hull of a ship in an image. Finally, the accuracy changes of the Focal EIoU hybrid loss function under different parameter changes were studied. The practical results trained on the SeaShips (7000) dataset demonstrate that the suggested method can detect and classify the ship position from the image more efficiently, with mAP of 0.976 and FPS of 119.84, which is ideal for real-time ship detection applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. A small object detection architecture with concatenated detection heads and multi-head mixed self-attention mechanism.

Author: Mu, Jianhong, Su, Qinghua, Wang, Xiyu, Liang, Wenhui, Xu, Sheng, and Wan, Kaizheng
Abstract: A novel detection method is proposed to address the challenge of detecting small objects in object detection. This method augments the YOLOv8n architecture with a small object detection layer and innovatively designs a Concat-detection head to effectively extract features. Simultaneously, a new attention mechanism—Multi-Head Mixed Self-Attention (MMSA) mechanism—is introduced to enhance the feature-extraction capability of the backbone. To improve the detection sensitivity for small objects, a combination of Normalized Wasserstein Distance (NWD) and Intersection over Union (IoU) is used to calculate the localization loss, optimizing the bounding-box regression. Experimental results on the TT100K dataset show that the mean average precision (mAP@0.5) reaches 88.1%, which is a 13.5% improvement over YOLOv8n. The method’s versatility is also validated through experiments on the BDD100K dataset, where it is compared with various object-detection algorithms. The results demonstrate that this method yields significant improvements and practical value in the field of small-object detection. Detailed code can be found at . [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Analyzing Real-Time Object Detection with YOLO Algorithm in Automotive Applications: A Review.

Author: Gheorghe, Carmen, Duguleana, Mihai, Boboc, Razvan Gabriel, and Postelnicu, Cristian Cezar
Subjects: OBJECT recognition (Computer vision), BIBLIOMETRICS, IMAGE processing, MOTOR vehicles, AUTONOMOUS vehicles
Abstract: Identifying objects in real-time is a technology that is developing rapidly and has a huge potential for expansion in many technical fields. Currently, systems that use image processing to detect objects are based on the information from a single frame. A video camera positioned in the analyzed area captures the image, monitoring in detail the changes that occur between frames. The You Only Look Once (YOLO) algorithm is a model for detecting objects in images, that is currently known for the accuracy of the data obtained and the fast-working speed. This study proposes a comprehensive literature review of YOLO research, as well as a bibliometric analysis to map the trends in the automotive field from 2020 to 2024. Object detection applications using YOLO were categorized into three primary domains: road traffic, autonomous vehicle development, and industrial settings. A detailed analysis was conducted for each domain, providing quantitative insights into existing implementations. Among the various YOLO architectures evaluated (v2–v8, H, X, R, C), YOLO v8 demonstrated superior performance with a mean Average Precision (mAP) of 0.99. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Comparison of algorithms for the detection of marine vessels with machine vision.

Author: Rodríguez-Gonzales, José, Niquin-Jaimes, Junior, and Paiva-Peredo, Ernesto
Subjects: CONVOLUTIONAL neural networks, OBJECT recognition (Computer vision), COMPUTER vision, DIGITAL image processing, MACHINE learning
Abstract: The detection of marine vessels for revenue control has many tracking deficiencies, which has resulted in losses of logistical resources, time, and money. However, digital cameras are not fully exploited since they capture images to recognize the vessels and give immediate notice to the control center. The analyzed images go through an incredibly detailed process, which, thanks to neural training, allows us to recognize vessels without false positives. To do this, we must understand the behavior of object detection; we must know critical issues such as neural training, image digitization, types of filters, and machine learning, among others. We present results by comparing two development environments with their corresponding algorithms, making the recognition of ships immediately under neural training. In conclusion, it is analyzed based on 100 images to measure the boat detection capability between both algorithms, the response time, and the effectiveness of an image obtained by a digital camera. The result obtained by YOLOv7 was 100% effective under the application of processing techniques based on neural networks in convolutional neural network (CNN) regions compared to MATLAB, which applies processing metrics based on morphological images, obtaining low results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Integrating YOLO and WordNet for automated image object summarization.

Author: Saqib, Sheikh Muhammad, Aftab, Aamir, Mazhar, Tehseen, Iqbal, Muhammad, Shahazad, Tariq, Almogren, Ahmad, and Hamam, Habib
Abstract: The demand for methods that automatically create text summaries from images containing many things has recently grown. Our research introduces a fresh and creative way to achieve this. We bring together the WordNet dictionary and the YOLO model to make this happen. YOLO helps us find where the things are in the images, while WordNet provides their meanings. Our process then crafts a summary for each object found. This new technique can have a big impact on computer vision and natural language processing. It can make understanding complicated images, filled with lots of things, much simpler. To test our approach, we used 1381 pictures from the Google Image search engine. Our results showed high accuracy, with 72% for object detection. The precision was 85%, the recall was 72%, and the F1-score was 74%. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. YOLO-Mamba: object detection method for infrared aerial images.

Author: Zhao, Zhihong and He, Peng
Abstract: At present, unmanned aerial vehicle (UAV) is widely used in various application fields. Through the detection of nighttime infrared images taken by UAV, it is convenient to analyze the ground situation in real time. Due to the problems of image blur and image noise in infrared images, higher requirements are put forward for object detection algorithms. Aiming at the problems of long-distance dependence and computational complexity of current object detection algorithms based on CNNs and self-attention mechanism, a new infrared aerial object detection method named YOLO-Mamba was proposed. The method combines Mamba with the attention mechanism, a new attention module based on Mamba was proposed. It uses Mamba to scan the features of the feature dimension and the spatial dimension of the image, and fully extracts the global context information, which further improves the algorithm's attention to the key area of the image and reduces the influence of redundant information. In the experiment, through the public infrared aerial image data set, the effectiveness of the improved attention module is verified from both quantitative and qualitative perspectives. The experimental results show that compared with SE and CBAM attention mechanism, the mAP50 is increased by 0.8% and 1.3% respectively, and the parameter quantity is between the two. The attention to critical regions is much higher than other algorithms. Finally, compared with other object detection method, the mAP50 is increased by 1.1% and the Map50-95 index is increased by 0.8% compared with the benchmark model YOLOv8n, and the number of parameters is only increased by 0.1 M. The research results provide a certain reference for improving the accuracy of object detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Pothole detection in adverse weather: leveraging synthetic images and attention-based object detection methods.

Author: Jakubec, Maros, Lieskovska, Eva, Bucko, Boris, and Zabovska, Katarina
Subjects: GENERATIVE adversarial networks, TRANSFORMER models, WEATHER, DATA augmentation, RAINFALL
Abstract: Potholes are a pervasive road hazard with the potential to cause accidents and vehicle damage. Detecting potholes accurately is essential for timely repairs and ensuring road safety. However, existing detection methods often struggle to perform in adverse weather conditions, including rain, snow, and low visibility. This work aims to improve pothole detection across diverse weather and lighting scenarios, employing a two-phase strategy that integrates data augmentation with images generated by Generative Adversarial Networks (GANs) and the deployment of visual attention techniques. For this purpose, advanced models such as YOLOv8, RT-DETR, and our modified version of YOLOv8 were employed. In the first phase, multiple image-to-image translation models were trained and applied to a real-world dataset to generate synthetic images of potholes under different weather conditions, including rain, fog, overcast, dawn, and night. The detection accuracy results show improvements in all monitored metrics across most tested conditions following the incorporation of augmentation. The most significant improvement resulting from augmentation was observed in low-visibility conditions, captured during evening and night, with an increase of up to 11% and 19% in mean Average Precision (mAP@.5) across all models. The second phase employed different modifications of YOLOv8 with modules such as Attention-Based Dense Atrous Spatial Pyramid Pooling, Vision Transformer and Global Attention Mechanism to enhance the detection of potholes in challenging visual conditions. The compensation for increased model complexity, such as the utilization of depthwise convolutions, was also employed. To evaluate the effectiveness of this approach, a publicly available pothole dataset with images captured in diverse weather conditions is used. The results indicate that the proposed method achieved an 8.4% improvement pre-augmentation and a 5.3% improvement post-augmentation compared to the original YOLOv8, surpassing existing approaches in terms of accuracy and enhancing pothole detection in adverse weather conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Ship detection using ensemble deep learning techniques from synthetic aperture radar imagery.

Author: Gupta, Himanshu, Verma, Om Prakash, Sharma, Tarun Kumar, Varshney, Hirdesh, Agarwal, Saurabh, and Pak, Wooguil
Abstract: Synthetic Aperture Radar (SAR) integrated with deep learning has been widely used in several military and civilian applications, such as border patrolling, to monitor and regulate the movement of people and goods across land, air, and maritime borders. Amongst these, maritime borders confront different threats and challenges. Therefore, SAR-based ship detection becomes essential for naval surveillance in marine traffic management, oil spill detection, illegal fishing, and maritime piracy. However, the model becomes insensitive to small ships due to the wide-scale variance and uneven distribution of ship sizes in SAR images. This increases the difficulties associated with ship recognition, which triggers several false alarms. To effectively address these difficulties, the present work proposes an ensemble model (eYOLO) based on YOLOv4 and YOLOv5. The model utilizes a weighted box fusion technique to fuse the outputs of YOLOv4 and YOLOv5. Also, a generalized intersection over union loss has been adopted in eYOLO which ensures the increased generalization capability of the model with reduced scale sensitivity. The model has been developed end-to-end, and its performance has been validated against other reported results using an open-source SAR-ship dataset. The obtained results authorize the effectiveness of eYOLO in multi-scale ship detection with an F1score and mAP of 91.49% and 92.00%, respectively. This highlights the efficacy of eYOLO in multi-scale ship detection using SAR imagery. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Efficient deep learning based rail fastener screw detection method for fastener screw maintenance robot under complex lighting conditions.

Author: Cai, Yijie, He, Ming, and Chen, Bin
Subjects: *DATA mining, *FEATURE extraction, *DEEP learning, *STREET railroads, *SCREWS
Abstract: For the rail fastener replacement operation at night in the wilderness, the lighting conditions on the rail fastener screws are complex, due to the multiple illuminants like headlamps and flashlights at the site, making some parts of the objects appear dark or low light status in the camera. These complex lighting conditions (CLCs) interfere with the fastener recognition ability of the fastener screw detection algorithm since it can hardly maintain fixed and optimized lighting conditions of the fastener screw. We propose the LFGB-YOLO, a novel YOLO-based model that contains two principal parts: the Light-Fast part and the GB-Neck part. The Light-Fast part can reduce the network Params, FLOPs, and memory access frequency in feature extraction while keeping a high precision. The GB-Neck part can lighten the feature fusion network while maintaining the ability of accurate feature information extraction operation. Experimental results demonstrate that the LFGB-YOLO performs excellently in metrics like Recall, mAP@0.5, F1 score, and FPS, better than the performance of competitive models like YOLOv5n, YOLOv7-miny, and YOLOv8. The performance metrics of the proposed model, Recall, mAP@0.5, F1-score, and FPS are increased by 8.9%, 4%, 4.8%, and 8.1% compared with the baseline model. It shows that our work not only performs satisfactorily in detecting fastener screws under CLCs but also inspires new studies that focus on the fastener screw detection affected by environmental factors. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Toward Versatile Small Object Detection with Temporal-YOLOv8.

Author: van Leeuwen, Martin C., Fokkinga, Ella P., Huizinga, Wyke, Baan, Jan, and Heslinga, Friso G.
Subjects: *OBJECT recognition (Computer vision), *DATA augmentation, *SIGNAL-to-noise ratio, *DETECTORS, *DEEP learning, *MAPS
Abstract: Deep learning has become the preferred method for automated object detection, but the accurate detection of small objects remains a challenge due to the lack of distinctive appearance features. Most deep learning-based detectors do not exploit the temporal information that is available in video, even though this context is often essential when the signal-to-noise ratio is low. In addition, model development choices, such as the loss function, are typically designed around medium-sized objects. Moreover, most datasets that are acquired for the development of small object detectors are task-specific and lack diversity, and the smallest objects are often not well annotated. In this study, we address the aforementioned challenges and create a deep learning-based pipeline for versatile small object detection. With an in-house dataset consisting of civilian and military objects, we achieve a substantial improvement in YOLOv8 (baseline mAP = 0.465) by leveraging the temporal context in video and data augmentations specifically tailored to small objects (mAP = 0.839). We also show the benefit of having a carefully curated dataset in comparison with public datasets and find that a model trained on a diverse dataset outperforms environment-specific models. Our findings indicate that small objects can be detected accurately in a wide range of environments while leveraging the speed of the YOLO architecture. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Damage Detection and Segmentation in Disaster Environments Using Combined YOLO and Deeplab.

Author: Jo, So-Hyeon, Woo, Joo, Kang, Chang Ho, and Kim, Sun Young
Subjects: *CRACKING of concrete, *IMAGE analysis, *STANDARD deviations, *PRODUCT returns, *NOISE
Abstract: Building damage due to various causes occurs frequently and has risk factors that can cause additional collapses. However, it is difficult to accurately identify objects in complex structural sites because of inaccessible situations and image noise. In conventional approaches, close-up images have been used to detect and segment damage images such as cracks. In this study, the method of using a deep learning model is proposed for the rapid determination and analysis of multiple damage types, such as cracks and concrete rubble, in disaster sites. Through the proposed method, it is possible to perform analysis by receiving image information from a robot explorer instead of a human, and it is possible to detect and segment damage information even when the damaged point is photographed at a distance. To accomplish this goal, damage information is detected and segmented using YOLOv7 and Deeplabv2. Damage information is quickly detected through YOLOv7, and semantic segmentation is performed using Deeplabv2 based on the bounding box information obtained through YOLOv7. By using images with various resolutions and senses of distance for training, damage information can be effectively detected not only at short distances but also at long distances. When comparing the results, depending on how YOLOv7 and Deeplabv2 were used, they returned better scores than the comparison model, with a Recall of 0.731, Precision of 0.843, F1 of 0.770, and mIoU of 0.638, and had the lowest standard deviation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Method to Estimate Dislocation Densities from Images of α‐Ga2O3‐Based Corundum Oxides Using the Computer Vision YOLO Algorithm.

Author: Dang, Giang T., Kawaharamura, Toshiyuki, and Allen, Martin W.
Subjects: *SEMICONDUCTOR thin films, *DISLOCATION density, *CONVOLUTIONAL neural networks, *COMPUTER vision, *THIN films
Abstract: This work applies the computer vision “You only look once” (YOLO) algorithm to extract bounding boxes around dislocations in weak‐beam dark‐field transmission electron microscopy (WBDF TEM) images of semiconductor thin films. A formula is derived to relate the sum of the relative heights of the bounding boxes to the dislocation densities in the films. WBDF TEM images reported in the literature and taken from our α‐Ga2O3 samples are divided into train, evaluation, and test datasets. Different models are trained using the train dataset and evaluated using the evaluation dataset to find the best confidence values, which are used to select the best model based on the performance against the test data set. For α‐Ga2O3 thin films, dislocation density output by this model is on average ≈58% of those estimated by the traditional Ham method. A factor of 4/π may contribute to the systematic underestimation of the model versus the Ham method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Improved hybrid feature extractor in lightweight convolutional neural network for postharvesting technology: automated oil palm fruit grading.

Author: Junos, Mohamad Haniff, Mohd Khairuddin, Anis Salwa, Abu Talip, Mohamad Sofian, Kairi, Muhammad Izhar, and Siran, Yosri Mohd
Subjects: *CONVOLUTIONAL neural networks, *DEEP learning, *PRECISION farming, *OPERATING costs, *LIGHT intensity, *OIL palm
Abstract: Grading of oil palm fresh fruit bunches (FFB) plays a vital role in the postharvest operation as it directly influences the extraction rate of oil palm, thereby ensuring quality control in the estate and mill. Currently, manual grading based on visual assessment is employed, but it has limitations mainly due to subjective judgment-influenced factors such as visual resemblance, light intensities, and differences in colors across ripeness categories. Hence, an automated oil palm fruit grading system in postharvest technology is proposed to enhance the grading process and improve productivity while maintaining operational cost efficiency. This work involves developing an improved object detection model based on the You Only Look Once model to accurately identify four grades of oil palm FFB, namely, ripe, unripe, underripe, and overripe. The proposed model incorporated several improvements, including a hybrid feature extractor comprising mobile inverted bottleneck module and densely connected neural network. Additionally, it employs a spatial pyramid pooling structure to expand the receptive field and utilizes the complete intersection over union function for bounding box regression. The results indicate that the proposed model obtains a remarkable mAP of 94.37% and an F1-score of 0.89. Besides, the model performs real-time detection at a faster rate of 4.8 FPS on a limited-capacity embedded device, NVIDIA Jetson Nano. The comprehensive experimental results confirm the superiority of the proposed model over various detection models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Deep learning based identification and tracking of railway bogie parts.

Author: Shaikh, Muhammad Zakir, Ahmed, Zeeshan, Baro, Enrique Nava, Hussain, Samreen, and Milanova, Mariofanna
Subjects: OBJECT recognition (Computer vision), COMPUTER vision, RAILROAD safety measures, DEEP learning, ROLLING stock, BOGIES (Vehicles)
Abstract: The Train Rolling-Stock Examination (TRSE) is a safety examination process that physically examines the bogie parts of a moving train, typically at speeds over 30 km/h. Currently, this inspection process is done manually by railway personnel in many countries to ensure safety and prevent interruptions to rail services. Although many earlier attempts have been made to semi-automate this process through computer-vision models, these models are iterative and still require manual intervention. Consequently, these attempts were unsuitable for real-time implementations. In this work, we propose a detection model by utilizing a deep-learning based classifier that can precisely identify bogie parts in real-time without manual intervention, allowing an increase in the deployability of these inspection systems. We implemented the Anchor-Free Yolov8 (AFYv8) model, which has a decoupled-head module for recognizing bogie parts. Additionally, we incorporated bogie parts tracking with the AFYv8 model to gather information about any missing parts. To test the effectiveness of the AFYv8-model, the bogie videos were captured at three different timestamps and the result shows the increase in the recognition accuracy of TRSE by 10 % compared to the previously developed classifiers. This research has the potential to enhance railway safety and minimize operational interruptions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Automatic System for Acquisition and Analysis of Microscopic Digital Images Containing Activated Sludge.

Author: Staniszewski, Michał, Dziadosz, Marcin, Zaburko, Jacek, Babko, Roman, and Łagód, Grzegorz
Subjects: ACTIVATED sludge process, WATER treatment plants, MICROSCOPY, AUTOMATIC control systems, IMAGE analysis, DEEP learning
Abstract: The article contains the procedure of image acquisition, including sampling of analyzed material as well as technical solutions of hardware and preprocessing used in research. A dataset of digital images containing identified objects were obtained with help of automated mechanical system for controlling the microscope table and used to train the YOLO models. The performance of YOLOv4 as well as YOLOv8 deep learning networks was compared on the basis of automatic image analysis. YOLO constitutes a one-stage object detection model, aiming to examine the analyzed image only once. By utilizing a single neural network, the image is divided into a grid of cells, and predictions are made for bounding boxes, as well as object class probabilities for each box. This approach allows real-time detection with minimal accuracy loss. The study involved ciliated protozoa Vorticella as a test object. These organisms are found both in natural water bodies and in treatment plants that employ the activated sludge method. As a result of its distinct appearance, high abundance and sedentary lifestyle, Vorticella are good subjects for detection tasks. To ensure that the training dataset is accurate, the images were manually labeled. The performance of the models was evaluated using such metrics as accuracy, precision, and recall. The final results show the differences in metrics characterizing the obtained outputs and progress in the software over subsequent versions of the YOLO algorithm. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Autonomous Multitask Driving Systems Using Improved You Only Look Once Based on Panoptic Driving Perception.

Author: Chun-Jung Lin, Cheng-Jian Lin, and Yi-Chen Yang
Subjects: TAGUCHI methods, DATABASES, AUTONOMOUS vehicles
Abstract: With the continuous development of science and technology, automatic assisted driving is becoming a trend that cannot be ignored. The You Only Look Once (YOLO) model is usually used to detect roads and drivable areas. Since YOLO is often used for a single task and its parameter combination is difficult to obtain, we propose a Taguchi-based YOLO for panoptic driving perception (T-YOLOP) model to improve the accuracy and computing speed of the model in deteching drivable areas and lanes, making it a more practical panoptic driving perception system. In the T-YOLOP model, the Taguchi method is used to determine the appropriate parameter combination. Our experiments use the BDD100K database to verify the performance of the proposed T-YOLOP model. Experimental results show that the accuracies of the proposed T-YOLOP model in deteching drivable areas and lanes are 97.9 and 73.9%, respectively, and these results are better than those of the traditional YOLOP model. Therefore, the proposed T-YOLOP model successfully provides a more reliable solution for the application of panoramic driving perception systems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. LCGSC-YOLO: a lightweight apple leaf diseases detection method based on LCNet and GSConv module under YOLO framework.

Author: Wang, Jianlong, Qin, Congcong, Hou, Beibei, Yuan, Yuan, Zhang, Yake, and Feng, Wenfeng
Subjects: CONVOLUTIONAL neural networks, DEEP learning, PLANT diseases, NECK, ALGORITHMS
Abstract: Introduction: In response to the current mainstream deep learning detection methods with a large number of learned parameters and the complexity of apple leaf disease scenarios, the paper proposes a lightweight method and names it LCGSC-YOLO. This method is based on the LCNet(A Lightweight CPU Convolutional Neural Network) and GSConv(Group Shuffle Convolution) module modified YOLO(You Only Look Once) framework. Methods: Firstly, the lightweight LCNet is utilized to reconstruct the backbone network, with the purpose of reducing the number of parameters and computations of the model. Secondly, the GSConv module and the VOVGSCSP (Slim-neck by GSConv) module are introduced in the neck network, which makes it possible to minimize the number of model parameters and computations while guaranteeing the fusion capability among the different feature layers. Finally, coordinate attention is embedded in the tail of the backbone and after each VOVGSCSP module to improve the problem of detection accuracy degradation issue caused by model lightweighting. Results: The experimental results show the LCGSC-YOLO can achieve an excellent detection performance with mean average precision of 95.5% and detection speed of 53 frames per second (FPS) on the mixed datasets of Plant Pathology 2021 (FGVC8) and AppleLeaf9. Discussion: The number of parameters and Floating Point Operations (FLOPs) of the LCGSC-YOLO are much less thanother related comparative experimental algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Estimation of sorghum seedling number from drone image based on support vector machine and YOLO algorithms.

Author: Chen, Hongxing, Chen, Hui, Huang, Xiaoyun, Zhang, Song, Chen, Shengxi, Cen, Fulang, He, Tengbing, Zhao, Quanzhi, and Gao, Zhenran
Subjects: MACHINE learning, SUPPORT vector machines, DRONE aircraft, DEEP learning, CROP growth, SORGHUM
Abstract: Accurately counting the number of sorghum seedlings from images captured by unmanned aerial vehicles (UAV) is useful for identifying sorghum varieties with high seedling emergence rates in breeding programs. The traditional method is manual counting, which is time-consuming and laborious. Recently, UAV have been widely used for crop growth monitoring because of their low cost, and their ability to collect high-resolution images and other data non-destructively. However, estimating the number of sorghum seedlings is challenging because of the complexity of field environments. The aim of this study was to test three models for counting sorghum seedlings rapidly and automatically from red-green-blue (RGB) images captured at different flight altitudes by a UAV. The three models were a machine learning approach (Support Vector Machines, SVM) and two deep learning approaches (YOLOv5 and YOLOv8). The robustness of the models was verified using RGB images collected at different heights. The R2 values of the model outputs for images captured at heights of 15 m, 30 m, and 45 m were, respectively, (SVM: 0.67, 0.57, 0.51), (YOLOv5: 0.76, 0.57, 0.56), and (YOLOv8: 0.93, 0.90, 0.71). Therefore, the YOLOv8 model was most accurate in estimating the number of sorghum seedlings. The results indicate that UAV images combined with an appropriate model can be effective for large-scale counting of sorghum seedlings. This method will be a useful tool for sorghum phenotyping. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. WCAY object detection of fractures for X-ray images of multiple sites.

Author: Chen, Peng, Liu, Songyan, Lu, Wenbin, Lu, Fangpeng, and Ding, Boyang
Subjects: *OBJECT recognition (Computer vision), *X-ray imaging, *DEEP learning, *X-ray detection, *MULTIPLE comparisons (Statistics)
Abstract: The WCAY (weighted channel attention YOLO) model, which is meticulously crafted to identify fracture features across diverse X-ray image sites, is presented herein. This model integrates novel core operators and an innovative attention mechanism to enhance its efficacy. Initially, leveraging the benefits of dynamic snake convolution (DSConv), which is adept at capturing elongated tubular structural features, we introduce the DSC-C2f module to augment the model's fracture detection performance by replacing a portion of C2f. Subsequently, we integrate the newly proposed weighted channel attention (WCA) mechanism into the architecture to bolster feature fusion and improve fracture detection across various sites. Comparative experiments were conducted, to evaluate the performances of several attention mechanisms. These enhancement strategies were validated through experimentation on public X-ray image datasets (FracAtlas and GRAZPEDWRI-DX). Multiple experimental comparisons substantiated the model's efficacy, demonstrating its superior accuracy and real-time detection capabilities. According to the experimental findings, on the FracAtlas dataset, our WCAY model exhibits a notable 8.8% improvement in mean average precision (mAP) over the original model. On the GRAZPEDWRI-DX dataset, the mAP reaches 64.4%, with a detection accuracy of 93.9% for the "fracture" category alone. The proposed model represents a substantial improvement over the original algorithm compared to other state-of-the-art object detection models. The code is publicly available at https://github.com/cccp421/Fracture-Detection-WCAY. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Tailhook Recognition for Carrier-Based Aircraft Based on YOLO with Bi-Level Routing Attention.

Author: Lu, Aiguo, Liu, Pandi, Yang, Jie, Li, Zhe, and Wang, Ke
Subjects: *OBJECT recognition (Computer vision), *LEARNING ability, *MODEL airplanes, *ROUTING algorithms
Abstract: To address the problems of missed and false detections caused by target occlusion and lighting variations, this paper proposes a recognition model based on YOLOv5 with bi-level routing attention to achieve precise real-time small object recognition, using the problem of tailhook recognition for carrier-based aircraft as a representative application. Firstly, a module called D_C3, which combines deformable convolution, was integrated into the backbone network to enhance the model's learning ability and adaptability in specific scenes. Secondly, a bi-level routing attention mechanism was employed to dynamically focus on the regions of the feature map that are more likely to contain the target, leading to more accurate target localization and classification. Additionally, the loss function was optimized to accelerate the bounding box regression process. The experimental results on the self-constructed CATHR-DET and the public VOC dataset demonstrate that the proposed method outperforms the baselines in overall performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Development of a Drone-Based Phenotyping System for European Pear Rust (Gymnosporangium sabinae) in Orchards.

Author: Maß, Virginia, Seidl-Schulz, Johannes, Leipnitz, Matthias, Fritzsche, Eric, Geyer, Martin, Pflanz, Michael, and Reim, Stefanie
Subjects: *GEOGRAPHIC information system software, *OBJECT recognition (Computer vision), *COMMON pear, *PEARS, *ORCHARDS
Abstract: Computer vision techniques offer promising tools for disease detection in orchards and can enable effective phenotyping for the selection of resistant cultivars in breeding programmes and research. In this study, a digital phenotyping system for disease detection and monitoring was developed using drones, object detection and photogrammetry, focusing on European pear rust (Gymnosporangium sabinae) as a model pathogen. High-resolution RGB images from ten low-altitude drone flights were collected in 2021, 2022 and 2023. A total of 16,251 annotations of leaves with pear rust symptoms were created on 584 images using the Computer Vision Annotation Tool (CVAT). The YOLO algorithm was used for the automatic detection of symptoms. A novel photogrammetric approach using Agisoft's Metashape Professional software ensured the accurate localisation of symptoms. The geographic information system software QGIS calculated the infestation intensity per tree based on the canopy areas. This drone-based phenotyping system shows promising results and could considerably simplify the tasks involved in fruit breeding research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Estimation of Strawberry Canopy Volume in Unmanned Aerial Vehicle RGB Imagery Using an Object Detection-Based Convolutional Neural Network.

Author: Gang, Min-Seok, Sutthanonkul, Thanyachanok, Lee, Won Suk, Liu, Shiyu, and Kim, Hak-Jin
Subjects: *CONVOLUTIONAL neural networks, *STANDARD deviations, *DRONE aircraft, *PLANT canopies, *DIGITAL cameras
Abstract: Estimating canopy volumes of strawberry plants can be useful for predicting yields and establishing advanced management plans. Therefore, this study evaluated the spatial variability of strawberry canopy volumes using a ResNet50V2-based convolutional neural network (CNN) model trained with RGB images acquired through manual unmanned aerial vehicle (UAV) flights equipped with a digital color camera. A preprocessing method based on the You Only Look Once v8 Nano (YOLOv8n) object detection model was applied to correct image distortions influenced by fluctuating flight altitude under a manual maneuver. The CNN model was trained using actual canopy volumes measured using a cylindrical case and small expanded polystyrene (EPS) balls to account for internal plant spaces. Estimated canopy volumes using the CNN with flight altitude compensation closely matched the canopy volumes measured with EPS balls (nearly 1:1 relationship). The model achieved a slope, coefficient of determination (R2), and root mean squared error (RMSE) of 0.98, 0.98, and 74.3 cm3, respectively, corresponding to an 84% improvement over the conventional paraboloid shape approximation. In the application tests, the canopy volume map of the entire strawberry field was generated, highlighting the spatial variability of the plant's canopy volumes, which is crucial for implementing site-specific management of strawberry crops. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Enhancing Grapevine Node Detection to Support Pruning Automation: Leveraging State-of-the-Art YOLO Detection Models for 2D Image Analysis.

Author: Oliveira, Francisco, da Silva, Daniel Queirós, Filipe, Vítor, Pinho, Tatiana Martins, Cunha, Mário, Cunha, José Boaventura, and dos Santos, Filipe Neves
Subjects: *AUTONOMOUS robots, *AGRICULTURAL robots, *GRAPES, *DEEP learning, *PRECISION farming, *PRUNING
Abstract: Automating pruning tasks entails overcoming several challenges, encompassing not only robotic manipulation but also environment perception and detection. To achieve efficient pruning, robotic systems must accurately identify the correct cutting points. A possible method to define these points is to choose the cutting location based on the number of nodes present on the targeted cane. For this purpose, in grapevine pruning, it is required to correctly identify the nodes present on the primary canes of the grapevines. In this paper, a novel method of node detection in grapevines is proposed with four distinct state-of-the-art versions of the YOLO detection model: YOLOv7, YOLOv8, YOLOv9 and YOLOv10. These models were trained on a public dataset with images containing artificial backgrounds and afterwards validated on different cultivars of grapevines from two distinct Portuguese viticulture regions with cluttered backgrounds. This allowed us to evaluate the robustness of the algorithms on the detection of nodes in diverse environments, compare the performance of the YOLO models used, as well as create a publicly available dataset of grapevines obtained in Portuguese vineyards for node detection. Overall, all used models were capable of achieving correct node detection in images of grapevines from the three distinct datasets. Considering the trade-off between accuracy and inference speed, the YOLOv7 model demonstrated to be the most robust in detecting nodes in 2D images of grapevines, achieving F1-Score values between 70% and 86.5% with inference times of around 89 ms for an input size of 1280 × 1280 px. Considering these results, this work contributes with an efficient approach for real-time node detection for further implementation on an autonomous robotic pruning system. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. FireNet: A Lightweight and Efficient Multi-Scenario Fire Object Detector.

Author: He, Yonghuan, Sahma, Age, He, Xu, Wu, Rong, and Zhang, Rui
Subjects: *TRANSFORMER models, *FIRE detectors, *SMART cities, *FALSE alarms, *GLOBAL method of teaching, *SMOKE
Abstract: Fire and smoke detection technologies face challenges in complex and dynamic environments. Traditional detectors are vulnerable to background noise, lighting changes, and similar objects (e.g., clouds, steam, dust), leading to high false alarm rates. Additionally, they struggle with detecting small objects, limiting their effectiveness in early fire warnings and rapid responses. As real-time monitoring demands grow, traditional methods often fall short in smart city and drone applications. To address these issues, we propose FireNet, integrating a simplified Vision Transformer (RepViT) to enhance global feature learning while reducing computational overhead. Dynamic snake convolution (DSConv) captures fine boundary details of flames and smoke, especially in complex curved edges. A lightweight decoupled detection head optimizes classification and localization, ideal for high inter-class similarity and small targets. FireNet outperforms YOLOv8 on the Fire Scene dataset (FSD) with a mAP@0.5 of 80.2%, recall of 78.4%, and precision of 82.6%, with an inference time of 26.7 ms. It also excels on the FSD dataset, addressing current fire detection challenges. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. YOLO-based microglia activation state detection.

Author: Liu, Jichi, Li, Wei, Lyu, Houkun, and Qi, Feng
Subjects: *OBJECT recognition (Computer vision), *CONVOLUTIONAL neural networks, *DEEP learning, *FEATURE extraction, *NETWORK performance
Abstract: Recognition of microglia activation state is required in the research of problems such as brain neurological diseases. In this paper, a novel recognition network based on YOLOv5 is proposed for microglia activation state recognition. Firstly, the decoupled head is integrated into the head network, and secondly, novel feature extraction modules containing DenseNet are introduced: the DenseNet-C2f module and the DenseNet-SimCSPSPPF module. Subsequently, Wise-IoU is employed as the loss function, and the parameters therein are discussed. The network performance was evaluated using the microglia dataset. The experimental results show that the average precision of the enhanced network increases from 59.6 to 65.6%. In addition, the recall was improved from 56.3 to 71.5%. These improvements resulted in more efficient detection performance, which better meets the requirements of the medical field for identifying microglia activation states. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. Smartphone-based pH titration for liquid food applications.

Author: Xiao, Yuhui, Huang, Yaqiu, Qiu, Junhong, Cai, Honghao, and Ni, Hui
Abstract: The pH detection helps control food quality, prevent spoilage, determine storage methods, and monitor additive levels. In the previous studies, colorimetric pH detection involved manual capture of target regions and classification of acid–base categories, leading to time-consuming processes. Additionally, some researchers relied solely on R*G*B* or H*S*V* to build regression models, potentially limiting their generalizability and robustness. To address the limitations, this study proposed a colorimetric method that combines pH paper, smartphone, computer vision, and machine learning for fast and precise pH detection. Advantages of the computer vision model YOLOv5 include its ability to quickly capture the target region of the pH paper and automatically categorize it as either acidic or basic. Subsequently, recursive feature elimination was applied to filter out irrelevant features from the R*G*B*, H*S*V*, L*a*b*, Gray, XR, XG, and XB. Finally, the support vector regression was used to develop the regression model for pH value prediction. YOLOv5 demonstrated exceptional performance with mean average precision of 0.995, classification accuracy of 100%, and detection time of 4.9 ms. The pH prediction model achieved a mean absolute error (MAE) of 0.023 for acidity and 0.061 for alkalinity, signifying a notable advancement compared to the MAE range of 0.03–0.46 observed in the previous studies. The proposed approach shows potential in improving the dependability and effectiveness of pH detection, specifically in resource-constrained scenarios. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. A Systematic Review and Comparative Analysis Approach to Boom Gate Access Using Plate Number Recognition.

Author: Bukola, Asaju Christine, Owolawi, Pius Adewale, Du, Chuling, and Van Wyk, Etienne
Subjects: OBJECT recognition (Computer vision), MACHINE learning, COMPUTER vision, ACCESS control, AUTOMOBILE license plates
Abstract: Security has been paramount to many organizations for many years, with access control being one of the critical measures to ensure security. Among various approaches to access control, vehicle plate number recognition has received wide attention. However, its application to boom gate access has not been adequately explored. This study proposes a method to access the boom gate by optimizing vehicle plate number recognition. Given the speed and accuracy of the YOLO (You Only Look Once) object detection algorithm, this study proposes using the YOLO deep learning algorithm for plate number detection to access a boom gate. To identify the gap and the most suitable YOLO variant, the study systematically surveyed the publication database to identify peer-reviewed articles published between 2020 and 2024 on plate number recognition using different YOLO versions. In addition, experiments are performed on four YOLO versions: YOLOv5, YOLOv7, YOLOv8, and YOLOv9, focusing on vehicle plate number recognition. The experiments, using an open-source dataset with 699 samples in total, reported accuracies of 81%, 82%, 83%, and 73% for YOLO V5, V7, V8, and V9, respectively. This comparative analysis aims to determine the most appropriate YOLO version for the task, optimizing both security and efficiency in boom gate access control systems. By optimizing the capabilities of advanced YOLO algorithms, the proposed method seeks to improve the reliability and effectiveness of access control through precise and rapid plate number recognition. The result of the analysis reveals that each YOLO version has distinct advantages depending on the application's specific requirements. In complex detection conditions with changing lighting and shadows, it was revealed that YOLOv8 performed better in terms of reduced loss rates and increased precision and recall metrics. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. YOLO-VSI: An Improved YOLOv8 Model for Detecting Railway Turnouts Defects in Complex Environments.

Author: Yu, Chenghai and Lu, Zhilong
Subjects: FEATURE extraction, PYRAMIDS, SPINE, GENERALIZATION, RAILROADS
Abstract: Railway turnouts often develop defects such as chipping, cracks, and wear during use. If not detected and addressed promptly, these defects can pose significant risks to train operation safety and passenger security. Despite advances in defect detection technologies, research specifically targeting railway turnout defects remains limited. To address this gap, we collected images from railway inspectors and constructed a dataset of railway turnout defects in complex environments. To enhance detection accuracy, we propose an improved YOLOv8 model named YOLO-VSS-SOUP-Inner-CIoU (YOLO-VSI). The model employs a state-space model (SSM) to enhance the C2f module in the YOLOv8 backbone, proposed the C2f-VSS module to better capture long-range dependencies and contextual features, thus improving feature extraction in complex environments. In the network's neck layer, we integrate SPDConv and Omni-Kernel Network (OKM) modules to improve the original PAFPN (Path Aggregation Feature Pyramid Network) structure, and proposed the Small Object Upgrade Pyramid (SOUP) structure to enhance small object detection capabilities. Additionally, the Inner-CIoU loss function with a scale factor is applied to further enhance the model's detection capabilities. Compared to the baseline model, YOLO-VSI demonstrates a 3.5% improvement in average precision on our railway turnout dataset, showcasing increased accuracy and robustness. Experiments on the public NEU-DET dataset reveal a 2.3% increase in average precision over the baseline, indicating that YOLO-VSI has good generalization capabilities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. MCBAN: A Small Object Detection Multi-Convolutional Block Attention Network.

Author: Bhanbhro, Hina, Hooi, Yew Kwang, Zakaria, Mohammad Nordin Bin, Kusakunniran, Worapan, and Amur, Zaira Hassan
Subjects: TECHNICAL institutes, NOISE
Abstract: Object detection has made a significant leap forward in recent years. However, the detection of small objects continues to be a great difficulty for various reasons, such as they have a very small size and they are susceptible to missed detection due to background noise. Additionally, small object information is affected due to the downsampling operations. Deep learning-based detection methods have been utilized to address the challenge posed by small objects. In this work, we propose a novel method, the Multi-Convolutional Block Attention Network (MCBAN), to increase the detection accuracy of minute objects aiming to overcome the challenge of information loss during the downsampling process. The multi-convolutional attention block (MCAB); channel attention and spatial attention module (SAM) that make up MCAB, have been crafted to accomplish small object detection with higher precision. We have carried out the experiments on the Karlsruhe Institute of Technology and Toyota Technological Institute (KITTI) and Pattern Analysis, Statical Modeling and Computational Learning (PASCAL) Visual Object Classes (VOC) datasets and have followed a step-wise process to analyze the results. These experiment results demonstrate that significant gains in performance are achieved, such as 97.75% for KITTI and 88.97% for PASCAL VOC. The findings of this study assert quite unequivocally the fact that MCBAN is much more efficient in the small object detection domain as compared to other existing approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,564 results on '"YOLO"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources