744 results on '"object segmentation"'
Search Results
2. An end-to-end approach to detect railway track defects based on supervised and self-supervised learning
- Author
-
Haroon, Muhammad, Khan, Muhammad Jawad, Cheema, Hammad M, Nasir, Muhammad Tauseef, Safdar, Muhammad, and Butt, Shahid Ikram Ullah
- Published
- 2024
- Full Text
- View/download PDF
3. Study of the patterns of variations in ice lakes and the factors influencing these changes on the southeastern Tibetan plateau
- Author
-
Mingwei, Y.U., Feng, L.I., Yonggang, G.U.O., Libin, S.U., and Deshun, Q.I.N.
- Published
- 2024
- Full Text
- View/download PDF
4. Accurate drone corner position estimation in complex backgrounds with boundary classification
- Author
-
Tsai, Yu-Shiuan, Lin, Cheng-Sheng, and Li, Guan-Yi
- Published
- 2024
- Full Text
- View/download PDF
5. Challenges of Automatic Optical Inspection of Used Turbine Blades with Convolutional Neural Networks
- Author
-
Lehr, J., Briese, C., Mönchinger, S., Kroeger, O., Krüger, J., Chaari, Fakher, Series Editor, Gherardini, Francesco, Series Editor, Ivanov, Vitalii, Series Editor, Haddar, Mohamed, Series Editor, Cavas-Martínez, Francisco, Editorial Board Member, di Mare, Francesca, Editorial Board Member, Kwon, Young W., Editorial Board Member, Tolio, Tullio A. M., Editorial Board Member, Trojanowska, Justyna, Editorial Board Member, Schmitt, Robert, Editorial Board Member, Xu, Jinyang, Editorial Board Member, Kohl, Holger, editor, Seliger, Günther, editor, Dietrich, Franz, editor, and Mur, Sebastián, editor
- Published
- 2025
- Full Text
- View/download PDF
6. Boosting Gaze Object Prediction via Pixel-Level Supervision from Vision Foundation Model
- Author
-
Jin, Yang, Zhang, Lei, Yan, Shi, Fan, Bin, Wang, Binglu, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
7. Prior Mask-Guided Highly Accurate Dichotomous Image Segmentation
- Author
-
Zhou, Shanfeng, Yuan, Bo, Fu, Keren, Zhang, Hailun, Zhao, Qijun, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hadfi, Rafik, editor, Anthony, Patricia, editor, Sharma, Alok, editor, Ito, Takayuki, editor, and Bai, Quan, editor
- Published
- 2025
- Full Text
- View/download PDF
8. Unsupervised Moving Object Segmentation with Atmospheric Turbulence
- Author
-
Qin, Dehao, Saha, Ripon Kumar, Chung, Woojeh, Jayasuriya, Suren, Ye, Jinwei, Li, Nianyi, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
9. Learning Camouflaged Object Detection from Noisy Pseudo Label
- Author
-
Zhang, Jin, Zhang, Ruiheng, Shi, Yanjiao, Cao, Zhe, Liu, Nian, Khan, Fahad Shahbaz, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Leonardis, Aleš, editor, Ricci, Elisa, editor, Roth, Stefan, editor, Russakovsky, Olga, editor, Sattler, Torsten, editor, and Varol, Gül, editor
- Published
- 2025
- Full Text
- View/download PDF
10. Using deep learning model integration to build a smart railway traffic safety monitoring system.
- Author
-
Chang, Chin-Chieh, Huang, Kai-Hsiang, Lau, Tsz-Kin, Huang, Chung-Fah, and Wang, Chun-Hsiung
- Subjects
- *
RAILROAD safety measures , *OBJECT recognition (Computer vision) , *ACCIDENT prevention , *ARTIFICIAL intelligence , *TRAFFIC safety , *INTRUSION detection systems (Computer security) - Abstract
According to the importance of railway safety, it is crucial to build a smart railway traffic safety system in Taiwan, especially there are often to see related accidents. Therefore, this study aimed to build a smart railway traffic safety system using the integration of object detection, segmentation, machine learning, and notification system. First, the Mask R-CNN model was applied to automatically build the digital boundaries of railway, which achieved an average Interest of Union (IOU) of over 0.9. Then, the YOLO v3 model was applied to detect intrusions of railway, especially humans' intrusion. The above object detection model achieved an Overall accuracy (OA) of over 90% for different classes, and an OA of 95.68% for human detection. The YOLO v3 model was also able to detect intrusion within different scenarios, such as nighttime, rainy daytime, and rainy nighttime. Moreover, the XGBoost model was applied to predict the sizes of intruding objects, which has a low MAE of 0.54 cm and an R2 score of 0.997. Finally, the LINE bot was applied to notify the related operators, including the above information, such as time of intrusion, locations, classes of intruding objects, sizes and the image of intrusion. The above implementation can be helpful for railway traffic safety monitoring, which may help related accidents prevention. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
11. Cross-Modal Cognitive Consensus Guided Audio–Visual Segmentation.
- Author
-
Shi, Zhaofeng, Wu, Qingbo, Meng, Fanman, Xu, Linfeng, and Li, Hongliang
- Published
- 2025
- Full Text
- View/download PDF
12. A dynamic dropout self-distillation method for object segmentation.
- Author
-
Chen, Lei, Cao, Tieyong, Zheng, Yunfei, Wang, Yang, Zhang, Bo, and Yang, Jibin
- Abstract
There is a phenomenon that better teachers cannot teach out better students in knowledge distillation due to the capacity mismatch. Especially in pixel-level object segmentation, there are some challenging pixels that are difficult for the student model to learn. Even if the student model learns from the teacher model for each pixel, the student’s performance still struggles to show significant improvement. Mimicking the learning process of human beings from easy to difficult, a dynamic dropout self-distillation method for object segmentation is proposed, which solves this problem by discarding the knowledge that the student struggles to learn. Firstly, the pixels where there is a significant difference between the teacher and student models are found according to the predicted probabilities. And these pixels are defined as difficult-to-learn pixel for the student model. Secondly, a dynamic dropout strategy is proposed to match the capability variation of the student model, which is used to discard the pixels with hard knowledge for the student model. Finally, to validate the effectiveness of the proposed method, a simple student model for object segmentation and a virtual teacher model with perfect segmentation accuracy are constructed. Experiment results on four public datasets demonstrate that, when there is a large performance gap between the teacher and student models, the proposed self-distillation method is more effective in improving the performance of the student model compared to other methods. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF
13. Image processing framework for in-process shaft diameter measurement on legacy manual machines.
- Author
-
Choudhari, Sahil J., Singh, Swarit Anand, Kumar, Aitha Sudheer, and Desai, Kaushal A.
- Subjects
- *
OBJECT recognition (Computer vision) , *COMPUTER vision , *IMAGE processing , *LATHES , *MACHINING , *DEEP learning - Abstract
In-process dimension measurement is critical to achieving higher productivity and realizing smart manufacturing goals during machining operations. Vision-based systems have significant potential to serve for in-process dimensions measurements, reduce human interventions, and achieve manufacturing-inspection integration. This paper presents early research on developing a vision-based system for in-process dimension measurement of machined cylindrical components utilizing image-processing techniques. The challenges with in-process dimension measurement are addressed by combining a deep learning-based object detection model, You Only Look Once version 2 (YOLOv2), and image processing algorithms for object localization, segmentation, and spatial pixel estimation. An automated image pixel calibration approach is incorporated to improve algorithm robustness. The image acquisition hardware and the real-time image processing framework are integrated to demonstrate the working of the proposed system by considering a case study of in-process stepped shaft diameter measurement. The system implementation on a manual lathe demonstrated robust utilities, eliminating the need for manual intermittent measurements, digitized in-process component dimensions, and improved machining productivity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Nested object detection using mask R-CNN: application to bee and varroa detection.
- Author
-
Kriouile, Yassine, Ancourt, Corinne, Wegrzyn-Wolska, Katarzyna, and Bougueroua, Lamine
- Subjects
- *
OBJECT recognition (Computer vision) , *VARROA , *IMAGE processing , *DEEP learning , *MITES , *BEEKEEPING - Abstract
In this paper, we address an essential problem related to object detection and image processing: detecting objects potentially nested in other ones. This problem exists particularly in the beekeeping sector: detecting varroa parasites on bees. Indeed, beekeepers must ensure the level of infestation of their apiaries by the varroa parasite which settles on the backs of bees. As far as we know, there is no yet a published approach to deal with nested object detection using only one neural network trained on two different datasets. We propose an approach that fills this gap. Therefore, we improve the accuracy and the efficiency of bee and varroa detection task. Our work is based on deep learning, more precisely Mask R-CNN neural network. Instead of segmenting detected objects (bees), we segment internal objects (varroas). We add a branch to Faster R-CNN to segment internal objects. We extract relevant features for internal object segmentation and suggest efficient method for training the neural network on two different datasets. Our experiments are based on a set of images of bee frames, containing annotated bees and varroa mites. Due to differences in occurrence rates, two different sets were created. After carrying out experiments, we ended up with a single neural network capable of detecting two nested objects without decreasing accuracy compared to two separate neural networks. Our approach, compared to traditional separate neural networks, improves varroa detection accuracy by 1.9%, reduces infestation level prediction error by 0.22%, and reduces execution time by 28% and model memory by 23%. In our approach, we extract Res4 (a layer of the ResNet neural network) features for varroa segmentation, which improves detection accuracy by 11% compared to standard FPN extraction. Thus, we suggest a new approach that detects nested objects more accurately than two separate network approaches. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. Automated Stock Volume Estimation Using UAV-RGB Imagery.
- Author
-
Goswami, Anurupa, Khati, Unmesh, Goyal, Ishan, Sabir, Anam, and Jain, Sakshi
- Subjects
- *
CROWNS (Botany) , *HERBICIDE application , *BIOINDICATORS , *TEMPERATURE control , *TREE growth , *CARBON cycle , *BIOMASS estimation - Abstract
Forests play a critical role in the global carbon cycle, with carbon storage being an important carbon pool in the terrestrial ecosystem with tree crown size serving as a versatile ecological indicator influencing factors such as tree growth, wind resistance, shading, and carbon sequestration. They help with habitat function, herbicide application, temperature regulation, etc. Understanding the relationship between tree crown area and stock volume is crucial, as it provides a key metric for assessing the impact of land-use changes on ecological processes. Traditional ground-based stock volume estimation using DBH (Diameter at Breast Height) is labor-intensive and often impractical. However, high-resolution UAV (unmanned aerial vehicle) imagery has revolutionized remote sensing and computer-based tree analysis, making forest studies more efficient and interpretable. Previous studies have established correlations between DBH, stock volume and above-ground biomass, as well as between tree crown area and DBH. This research aims to explore the correlation between tree crown area and stock volume and automate stock volume and above-ground biomass estimation by developing an empirical model using UAV-RGB data, making forest assessments more convenient and time-efficient. The study site included a significant number of training and testing sites to ensure the performance level of the developed model. The findings underscore a significant association, demonstrating the potential of integrating drone technology with traditional forestry techniques for efficient stock volume estimation. The results highlight a strong exponential correlation between crown area and stem stock volume, with a coefficient of determination of 0.67 and mean squared error (MSE) of 0.0015. The developed model, when applied to estimate cumulative stock volume using drone imagery, demonstrated a strong correlation with an R2 of 0.75. These results emphasize the effectiveness of combining drone technology with traditional forestry methods to achieve more precise and efficient stock volume estimation and, hence, automate the process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Hardness-aware loss for object segmentation.
- Author
-
Chen, Lei, Cao, Tieyong, Zheng, Yunfei, Wang, Yang, Zhang, Bo, and Yang, Jibin
- Subjects
CONVOLUTIONAL neural networks ,HARDNESS ,PIXELS ,ENTROPY ,PROBABILITY theory - Abstract
In object segmentation, the existence of hard-classified-pixels limits the segmentation performance. Focusing on these hard pixels through assigning different weights to pixel loss can guide the learning of segmentation model effectively. Existing loss weight assignment methods perceive pixels hardness by current predicted information, pay less attention to past predicted information. While current studies show that samples with less improvement in predicted probability compared to the past are difficult to learn. To define hard pixels more accurately, a hardness-aware loss for object segmentation is proposed. Firstly, the metric of pixel hardness degree is defined, and a mapping function is proposed to quantitatively evaluate the hardness degree which is defined on the difference between current and past predicted probabilities. Then a new compound metric, hardness value, is defined based on hardness degree and the uncertainty. Based on the compound metric, a new loss function is proposed. Experiment results on four datasets using convolutional neural network and Transformer as the backbone models demonstrate that the proposed method effectively improves the accuracy of object segmentation. Especially, in the segmentation model based on ResNet-50, the proposed method improves mean Intersection over Union (mIoU) by almost 4.3 % compared to cross entropy on DUT-O dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Research and Application of YOLOv11-Based Object Segmentation in Intelligent Recognition at Construction Sites.
- Author
-
He, Luhao, Zhou, Yongzhang, Liu, Lei, and Ma, Jianhua
- Subjects
BUILDING sites ,DIGITAL transformation ,CONSTRUCTION management ,EDGE computing ,BULLDOZERS - Abstract
With the increasing complexity of construction site environments, robust object detection and segmentation technologies are essential for enhancing intelligent monitoring and ensuring safety. This study investigates the application of YOLOv11-Seg, an advanced target segmentation technology, for intelligent recognition on construction sites. The research focuses on improving the detection and segmentation of 13 object categories, including excavators, bulldozers, cranes, workers, and other equipment. The methodology involves preparing a high-quality dataset through cleaning, annotation, and augmentation, followed by training the YOLOv11-Seg model over 351 epochs. The loss function analysis indicates stable convergence, demonstrating the model's effective learning capabilities. The evaluation results show an mAP@0.5 average of 0.808, F1 Score(B) of 0.8212, and F1 Score(M) of 0.8382, with 81.56% of test samples achieving confidence scores above 90%. The model performs effectively in static scenarios, such as equipment detection in Xiong'an New District, and dynamic scenarios, including real-time monitoring of workers and vehicles, maintaining stable performance even at 1080P resolution. Furthermore, it demonstrates robustness under challenging conditions, including nighttime, non-construction scenes, and incomplete images. The study concludes that YOLOv11-Seg exhibits strong generalization capability and practical utility, providing a reliable foundation for enhancing safety and intelligent monitoring at construction sites. Future work may integrate edge computing and UAV technologies to support the digital transformation of construction management. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Exploring Image Decolorization: Methods, Implementations, and Performance Assessment.
- Author
-
Žeger, Ivana, Šetka, Ivan, Marić, Domagoj, and Grgic, Sonja
- Subjects
COLOR space ,CUMULATIVE distribution function ,ROOT-mean-squares ,GRAYSCALE model ,IMAGE processing ,DEEP learning - Abstract
Decolorization is an image processing technique that converts a color input image into a grayscale image. This paper discusses the decolorization process and provides an overview of the methods based on the different principles used: basic conversion from RGB to YUV format using ITU Recommendations 601, 709, and 2020; basic conversion from RGB to LAB color space; the method using cumulative distribution function of color channels; one global decolorization method; and one based on deep learning. The grayscale images produced by these methods were evaluated using four objective metrics, allowing for a thorough analysis and comparison of the decolorization results. Additionally, the execution speed of the algorithms was assessed, providing insight into their performance efficiency. The results demonstrate that different metrics evaluate the decolorization methods differently, highlighting the importance of selecting an appropriate metric that aligns with the subsequent image processing tasks following decolorization. Furthermore, it was shown that the decolorization methods depend on the content of the images, performing better on natural images than on artificially generated ones. The decolorization methods were also examined in the context of object segmentation and edge detection. The results from segmentation and edge detection were aligned with the decolorization results, revealing that certain objective metrics for evaluating decolorization more effectively assessed the properties of the decolorized images, which are crucial for successful object segmentation and edge detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. 基于三维点云曲率突变的航空发动机进气道异物检测方法.
- Author
-
武星, 李兴达, 汤凯, 李杨志, 张航瑛, and 陈中文
- Abstract
Copyright of Computer Measurement & Control is the property of Magazine Agency of Computer Measurement & Control and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
20. MCC: Multi-Cluster Contrastive Semi-Supervised Segmentation Framework for Echocardiogram Videos
- Author
-
Yu-Jen Chen, Shr-Shiun Lin, Yiyu Shi, Tsung-Yi Ho, and Xiaowei Xu
- Subjects
Semisupervised learning ,object segmentation ,echocardiography ,self-supervised learning ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Variability in sonographer expertise often leads to low-quality ultrasound imaging, presenting significant challenges for accurate echocardiogram video segmentation. Current methods require extensive annotations, which are impractical given the large number of frames and artifacts in videos. To address this, we propose a Multi-Cluster Contrastive (MCC) learning framework, a semi-supervised approach that minimizes annotation requirements while maintaining high segmentation performance. Leveraging contrastive loss to enhance foreground feature extraction, our method incorporates multi-cluster contrastive loss to utilize multiple annotated ground-truths per batch and an anchor frame selection algorithm to improve segmentation performance. Experimental results on two public echocardiography datasets (MCE and EchoNet-Dynamic) demonstrate the effectiveness of our method, achieving state-of-the-art performance. The MCC framework enhances segmentation practicality by reducing annotation requirements, particularly for developing new datasets, and facilitates efficient segmentation of low-quality echocardiogram videos. Our implementation is available at https://github.com/windstormer/MCC.
- Published
- 2025
- Full Text
- View/download PDF
21. Automated Zebrafish Spine Scoring System Based on Instance Segmentation
- Author
-
Wen-Hsin Chen, Tien-Ying Kuo, Yu-Jen Wei, Cheng-Jung Ho, Ming-der Lin, Huan Chen, and Wen-Ying Lin
- Subjects
Deep learning ,machine learning ,object segmentation ,image analysis ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
In studying new medicines for osteoporosis, researchers use zebrafish as animal subjects to test drugs and observe the growth situation of their vertebrae in the spine to confirm the efficacy of new medicines. However, the current method for evaluating efficacy is time-consuming and labor-intensive, requiring manual observation. Taking advantage of advancements in deep learning technology, we propose an automatic method for detecting and recognizing zebrafish vertebrae of the images captured from image sensors to solve this problem. Our method was designed using Mask R-CNN as the instance segmentation backbone, enhanced with a mask enhancement module and a small object preprocessing approach to strengthen its detection abilities. Compared to the original Mask R-CNN architecture, our method improved the mean average precision (mAP) score for vertebra bounding box and mask detection by 7.1% to 97.7% and by 1.2% to 96.6%, respectively. Additionally, we developed a system using these detection algorithms to automatically calculate spinal vertebra growth scores, providing a valuable tool for researchers to assess drug efficacy.
- Published
- 2025
- Full Text
- View/download PDF
22. A dynamic dropout self-distillation method for object segmentation
- Author
-
Lei Chen, Tieyong Cao, Yunfei Zheng, Yang Wang, Bo Zhang, and Jibin Yang
- Subjects
Self-distillation ,Object segmentation ,Dynamic dropout ,Capacity mismatch ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract There is a phenomenon that better teachers cannot teach out better students in knowledge distillation due to the capacity mismatch. Especially in pixel-level object segmentation, there are some challenging pixels that are difficult for the student model to learn. Even if the student model learns from the teacher model for each pixel, the student’s performance still struggles to show significant improvement. Mimicking the learning process of human beings from easy to difficult, a dynamic dropout self-distillation method for object segmentation is proposed, which solves this problem by discarding the knowledge that the student struggles to learn. Firstly, the pixels where there is a significant difference between the teacher and student models are found according to the predicted probabilities. And these pixels are defined as difficult-to-learn pixel for the student model. Secondly, a dynamic dropout strategy is proposed to match the capability variation of the student model, which is used to discard the pixels with hard knowledge for the student model. Finally, to validate the effectiveness of the proposed method, a simple student model for object segmentation and a virtual teacher model with perfect segmentation accuracy are constructed. Experiment results on four public datasets demonstrate that, when there is a large performance gap between the teacher and student models, the proposed self-distillation method is more effective in improving the performance of the student model compared to other methods.
- Published
- 2024
- Full Text
- View/download PDF
23. Hardness-aware loss for object segmentation
- Author
-
Lei Chen, Tieyong Cao, Yunfei Zheng, Yang Wang, Bo Zhang, and Jibin Yang
- Subjects
Loss function ,Object segmentation ,Hardness value ,Uncertainty ,Epoch influence ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
In object segmentation, the existence of hard-classified-pixels limits the segmentation performance. Focusing on these hard pixels through assigning different weights to pixel loss can guide the learning of segmentation model effectively. Existing loss weight assignment methods perceive pixels hardness by current predicted information, pay less attention to past predicted information. While current studies show that samples with less improvement in predicted probability compared to the past are difficult to learn. To define hard pixels more accurately, a hardness-aware loss for object segmentation is proposed. Firstly, the metric of pixel hardness degree is defined, and a mapping function is proposed to quantitatively evaluate the hardness degree which is defined on the difference between current and past predicted probabilities. Then a new compound metric, hardness value, is defined based on hardness degree and the uncertainty. Based on the compound metric, a new loss function is proposed. Experiment results on four datasets using convolutional neural network and Transformer as the backbone models demonstrate that the proposed method effectively improves the accuracy of object segmentation. Especially, in the segmentation model based on ResNet-50, the proposed method improves mean Intersection over Union (mIoU) by almost 4.3 % compared to cross entropy on DUT-O dataset.
- Published
- 2024
- Full Text
- View/download PDF
24. Enhancing oil palm segmentation model with GAN-based augmentation
- Author
-
Qi Bin Kwong, Yee Thung Kon, Wan Rusydiah W. Rusik, Mohd Nor Azizi Shabudin, Shahirah Shazana A. Rahman, Harikrishna Kulaveerasingam, and David Ross Appleton
- Subjects
Oil palm segmentation ,GAN ,Object detection ,Object segmentation ,Data augmentation ,Vision transformer ,Computer engineering. Computer hardware ,TK7885-7895 ,Information technology ,T58.5-58.64 ,Electronic computers. Computer science ,QA75.5-76.95 - Abstract
Abstract In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms ( 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models.
- Published
- 2024
- Full Text
- View/download PDF
25. Enhancing learning on uncertain pixels in self-distillation for object segmentation.
- Author
-
Chen, Lei, Cao, Tieyong, Zheng, Yunfei, Wang, Yang, Zhang, Bo, and Yang, Jibin
- Subjects
CONVOLUTIONAL neural networks ,LEARNING ability ,TRANSFORMER models ,KNOWLEDGE transfer ,PIXELS - Abstract
Self-distillation method guides the model learning via transferring knowledge of the model itself, which has shown the advantages in object segmentation. However, it has been proved that uncertain pixels with predicted probability close to 0.5 will restrict the model performance. The existing self-distillation methods cannot guide the model to enhance its learning ability for uncertain pixels, so the improvement is limited. To boost the student model's learning ability for uncertain pixels, a novel self-distillation method is proposed. Firstly, the predicted probability in the current training sample and the ground truth label are fused to construct the teacher knowledge, as the current predicted information can express the performance of student models and represent the uncertainty of pixels more accurately. Secondly, a quadratic mapping function between the predicted probabilities of the teacher and student model is proposed. Theoretical analysis shows that the proposed method using the mapping function can guide the model to enhance the learning ability for uncertain pixels. Finally, the essential difference of utilizing the predicted probability of the student model in self-distillation is discussed in detail. Extensive experiments were conducted on models with convolutional neural networks and Transformer architectures as the backbone networks. The results on four public datasets demonstrate that the proposed method can effectively improve the student model performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Comparative Analysis of Nucleus Segmentation Techniques for Enhanced DNA Quantification in Propidium Iodide-Stained Samples.
- Author
-
Jónás, Viktor Zoltán, Paulik, Róbert, Molnár, Béla, and Kozlovszky, Miklós
- Subjects
FLOW cytometry ,FLUORIMETRY ,IMAGE analysis ,DATA mining ,IMAGE processing - Abstract
Digitization in pathology and cytology labs is now widespread, a significant shift from a decade ago when few doctors used image processing tools. Despite unchanged scanning times due to excitation in fluorescent imaging, advancements in computing power and software have enabled more complex algorithms, yielding better-quality results. This study evaluates three nucleus segmentation algorithms for ploidy analysis using propidium iodide-stained digital WSI slides. Our goal was to improve segmentation accuracy to more closely match DNA histograms obtained via flow cytometry, with the ultimate aim of enhancing the calibration method we proposed in a previous study, which seeks to align image cytometry results with those from flow cytometry. We assessed these algorithms based on raw segmentation performance and DNA histogram similarity, using confusion-matrix-based metrics. Results indicate that modern algorithms perform better, with F1 scores exceeding 0.845, compared to our earlier solution's 0.807, and produce DNA histograms that more closely resemble those from the reference FCM method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Enhancing oil palm segmentation model with GAN-based augmentation.
- Author
-
Kwong, Qi Bin, Kon, Yee Thung, Rusik, Wan Rusydiah W., Shabudin, Mohd Nor Azizi, Rahman, Shahirah Shazana A., Kulaveerasingam, Harikrishna, and Appleton, David Ross
- Subjects
TRANSFORMER models ,DATA augmentation ,OIL palm ,GENERATIVE adversarial networks ,TILES ,PALMS - Abstract
In digital agriculture, accurate crop detection is fundamental to developing automated systems for efficient plantation management. For oil palm, the main challenge lies in developing robust models that perform well in different environmental conditions. This study addresses the feasibility of using GAN augmentation methods to improve palm detection models. For this purpose, drone images of young palms (< 5 year-old) from eight different estates were collected, annotated, and used to build a baseline detection model based on DETR. StyleGAN2 was trained on the extracted palms and then used to generate a series of synthetic palms, which were then inserted into tiles representing different environments. CycleGAN networks were trained for bidirectional translation between synthetic and real tiles, subsequently utilized to augment the authenticity of synthetic tiles. Both synthetic and real tiles were used to train the GAN-based detection model. The baseline model achieved precision and recall values of 95.8% and 97.2%. The GAN-based model achieved comparable result, with precision and recall values of 98.5% and 98.6%. In the challenge dataset 1 consisting older palms (> 5 year-old), both models also achieved similar accuracies, with baseline model achieving precision and recall of 93.1% and 99.4%, and GAN-based model achieving 95.7% and 99.4%. As for the challenge dataset 2 consisting of storm affected palms, the baseline model achieved precision of 100% but recall was only 13%. The GAN-based model achieved a significantly better result, with a precision and recall values of 98.7% and 95.3%. This result demonstrates that images generated by GANs have the potential to enhance the accuracies of palm detection models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. AIDCON: An Aerial Image Dataset and Benchmark for Construction Machinery.
- Author
-
Ersoz, Ahmet Bahaddin, Pekcan, Onur, and Akbas, Emre
- Subjects
- *
MACHINE learning , *BUILDING sites , *OBJECT recognition (Computer vision) , *CONSTRUCTION equipment , *CONSTRUCTION projects , *DEEP learning , *PIXELS - Abstract
Applying deep learning algorithms in the construction industry holds tremendous potential for enhancing site management, safety, and efficiency. The development of such algorithms necessitates a comprehensive and diverse image dataset. This study introduces the Aerial Image Dataset for Construction (AIDCON), a novel aerial image collection containing 9563 construction machines across nine categories annotated at the pixel level, carrying critical value for researchers and professionals seeking to develop and refine object detection and segmentation algorithms across various construction projects. The study highlights the benefits of utilizing UAV-captured images by evaluating the performance of five cutting-edge deep learning algorithms—Mask R-CNN, Cascade Mask R-CNN, Mask Scoring R-CNN, Hybrid Task Cascade, and Pointrend—on the AIDCON dataset. It underscores the significance of clustering strategies for generating reliable and robust outcomes. The AIDCON dataset's unique aerial perspective aids in reducing occlusions and provides comprehensive site overviews, facilitating better object positioning and segmentation. The findings presented in this paper have far-reaching implications for the construction industry, as they enhance construction site efficiency while setting the stage for future advancements in construction site monitoring and management utilizing remote sensing technologies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Enhancing learning on uncertain pixels in self-distillation for object segmentation
- Author
-
Lei Chen, Tieyong Cao, Yunfei Zheng, Yang Wang, Bo Zhang, and Jibin Yang
- Subjects
Self-distillation ,Object segmentation ,Uncertain pixel ,Current prediction ,Electronic computers. Computer science ,QA75.5-76.95 ,Information technology ,T58.5-58.64 - Abstract
Abstract Self-distillation method guides the model learning via transferring knowledge of the model itself, which has shown the advantages in object segmentation. However, it has been proved that uncertain pixels with predicted probability close to 0.5 will restrict the model performance. The existing self-distillation methods cannot guide the model to enhance its learning ability for uncertain pixels, so the improvement is limited. To boost the student model’s learning ability for uncertain pixels, a novel self-distillation method is proposed. Firstly, the predicted probability in the current training sample and the ground truth label are fused to construct the teacher knowledge, as the current predicted information can express the performance of student models and represent the uncertainty of pixels more accurately. Secondly, a quadratic mapping function between the predicted probabilities of the teacher and student model is proposed. Theoretical analysis shows that the proposed method using the mapping function can guide the model to enhance the learning ability for uncertain pixels. Finally, the essential difference of utilizing the predicted probability of the student model in self-distillation is discussed in detail. Extensive experiments were conducted on models with convolutional neural networks and Transformer architectures as the backbone networks. The results on four public datasets demonstrate that the proposed method can effectively improve the student model performance.
- Published
- 2024
- Full Text
- View/download PDF
30. 基于双分支在线优化和特征融合的 视频目标跟踪算法.
- Author
-
李新鹏, 王 鹏, 李晓艳, 孙梦宇, 陈遵田, and 郜 辉
- Subjects
TRACKING algorithms ,ELECTRONIC equipment ,TEST reliability ,RELIABILITY in engineering ,RESEARCH institutes - Abstract
Copyright of Chinese Journal of Liquid Crystal & Displays is the property of Chinese Journal of Liquid Crystal & Displays and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
31. Improved organs at risk segmentation based on modified U‐Net with self‐attention and consistency regularisation.
- Author
-
Manko, Maksym, Popov, Anton, Gorriz, Juan Manuel, and Ramirez, Javier
- Subjects
CHEST (Anatomy) ,ARTIFICIAL neural networks ,COMPUTED tomography ,RETINAL blood vessels ,IMAGE segmentation ,HEART ,ESOPHAGUS - Abstract
Cancer is one of the leading causes of death in the world, with radiotherapy as one of the treatment options. Radiotherapy planning starts with delineating the affected area from healthy organs, called organs at risk (OAR). A new approach to automatic OAR segmentation in the chest cavity in Computed Tomography (CT) images is presented. The proposed approach is based on the modified U‐Net architecture with the ResNet‐34 encoder, which is the baseline adopted in this work. The new two‐branch CS‐SA U‐Net architecture is proposed, which consists of two parallel U‐Net models in which self‐attention blocks with cosine similarity as query‐key similarity function (CS‐SA) blocks are inserted between the encoder and decoder, which enabled the use of consistency regularisation. The proposed solution demonstrates state‐of‐the‐art performance for the problem of OAR segmentation in CT images on the publicly available SegTHOR benchmark dataset in terms of a Dice coefficient (oesophagus—0.8714, heart—0.9516, trachea—0.9286, aorta—0.9510) and Hausdorff distance (oesophagus—0.2541, heart—0.1514, trachea—0.1722, aorta—0.1114) and significantly outperforms the baseline. The current approach is demonstrated to be viable for improving the quality of OAR segmentation for radiotherapy planning. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. 基于方向编码与空洞采样的室内点云物体分割.
- Author
-
李彭, 陈西江, 赵不钒, 宣伟, and 邓辉
- Abstract
Copyright of Journal of Computer-Aided Design & Computer Graphics / Jisuanji Fuzhu Sheji Yu Tuxingxue Xuebao is the property of Gai Kan Bian Wei Hui and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
- Published
- 2024
- Full Text
- View/download PDF
33. Detection of the farmland plow areas using RGB-D images with an improved YOLOv5 model.
- Author
-
Jiangtao Ji, Zhihao Han, Kaixuan Zhao, Qianwen Li, and Shucan Du
- Subjects
- *
AGRICULTURAL equipment , *CONTOURS (Cartography) , *VISUAL fields , *COMPUTATIONAL complexity , *FARM tractors , *CAMERAS - Abstract
Recognition of the boundaries of farmland plow areas has an important guiding role in the operation of intelligent agricultural equipment. To precisely recognize these boundaries, a detection method for unmanned tractor plow areas based on RGB-Depth (RGB-D) cameras was proposed, and the feasibility of the detection method was analyzed. This method applied advanced computer vision technology to the field of agricultural automation. Adopting and improving the YOLOv5-seg object segmentation algorithm, first, the Convolutional Block Attention Module (CBAM) was integrated into Concentrated-Comprehensive Convolution Block (C3) to form C3CBAM, thereby enhancing the ability of the network to extract features from plow areas. The GhostConv module was also utilized to reduce parameter and computational complexity. Second, using the depth image information provided by the RGB-D camera combined with the results recognized by the YOLOv5-seg model, the mask image was processed to extract contour boundaries, align the contours with the depth map, and obtain the boundary distance information of the plowed area. Last, based on farmland information, the calculated average boundary distance was corrected, further improving the accuracy of the distance measurements. The experiment results showed that the YOLOv5-seg object segmentation algorithm achieved a recognition accuracy of 99% for plowed areas and that the ranging accuracy improved with decreasing detection distance. The ranging error at 5.5 m was approximately 0.056 m, and the average detection time per frame is 29 ms, which can meet the real-time operational requirements. The results of this study can provide precise guarantees for the autonomous operation of unmanned plowing units. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. FIRE DETECTION USING SURVEILLANCE SYSTEMS.
- Author
-
Mahmoud, Hanan Samir
- Subjects
TELEVISION in security systems ,IMAGE segmentation ,FIRE prevention ,IMAGE processing ,VIDEO surveillance - Abstract
This research aims at presenting a video-based system to detect fire in real time taking advantage of already existing surveillance systems for fire detection either inside or outside the building with different illumination and short or long distance surveillance scenes. Detection of fires with surveillance cameras is characterized by early detection and rapid performance. Information about the progress of the fire can be obtained through live video. Also vision-based system is capable of providing forensic evidence. The basic idea of the research is fire detection based on video as proposed Fourier descriptors were used to describe reddish moving objects. The proposed system idea is to detect reddish moving bodies in every frame and correlate the detections with the same reddish bodiest over time. Multi-threshold segmentation is used to divide the image. This method can be integrated with pretreatment and post-processing. The threshold is one of the most common ways to divide the image. The next stage after the segmentation is to obtain the reddish body features. The feature is created by obtaining the contour of the reddish body and estimating its normalized Fourier descriptors. If the reddish body contour's Fourier descriptors vary from frame to frame, one can predict fire. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Improved Object-Based Style Transfer with Single Deep Network
- Author
-
Kulkarni, Harshmohan, Khare, Om, Barve, Ninad, Mane, Sunil, Kacprzyk, Janusz, Series Editor, Gomide, Fernando, Advisory Editor, Kaynak, Okyay, Advisory Editor, Liu, Derong, Advisory Editor, Pedrycz, Witold, Advisory Editor, Polycarpou, Marios M., Advisory Editor, Rudas, Imre J., Advisory Editor, Wang, Jun, Advisory Editor, Roy, Satyabrata, editor, Sinwar, Deepak, editor, Dey, Nilanjan, editor, Perumal, Thinagaran, editor, and R. S. Tavares, João Manuel, editor
- Published
- 2024
- Full Text
- View/download PDF
36. Convex Segments for Convex Objects Using DNN Boundary Tracing and Graduated Optimization
- Author
-
Pal, Jimut B., Awate, Suyash P., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Linguraru, Marius George, editor, Dou, Qi, editor, Feragen, Aasa, editor, Giannarou, Stamatia, editor, Glocker, Ben, editor, Lekadir, Karim, editor, and Schnabel, Julia A., editor
- Published
- 2024
- Full Text
- View/download PDF
37. Loci-Segmented: Improving Scene Segmentation Learning
- Author
-
Traub, Manuel, Becker, Frederic, Sauter, Adrian, Otte, Sebsastian, Butz, Martin V., Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Wand, Michael, editor, Malinovská, Kristína, editor, Schmidhuber, Jürgen, editor, and Tetko, Igor V., editor
- Published
- 2024
- Full Text
- View/download PDF
38. Investigating Neural Networks and Transformer Models for Enhanced Comic Decoding
- Author
-
Kouletou, Eleanna, Papavassiliou, Vassilis, Katsouros, Vassilis, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Mouchère, Harold, editor, and Zhu, Anna, editor
- Published
- 2024
- Full Text
- View/download PDF
39. Evaluation of Deep Learning Models for Polymetallic Nodule Detection and Segmentation in Seafloor Imagery
- Author
-
Gabriel Loureiro, André Dias, José Almeida, Alfredo Martins, and Eduardo Silva
- Subjects
deep sea ,polymetallic nodules ,deep learning ,object detection ,object segmentation ,Naval architecture. Shipbuilding. Marine engineering ,VM1-989 ,Oceanography ,GC1-1581 - Abstract
Climate change has led to the need to transition to clean technologies, which depend on an number of critical metals. These metals, such as nickel, lithium, and manganese, are essential for developing batteries. However, the scarcity of these elements and the risks of disruptions to their supply chain have increased interest in exploiting resources on the deep seabed, particularly polymetallic nodules. As the identification of these nodules must be efficient to minimize disturbance to the marine ecosystem, deep learning techniques have emerged as a potential solution. Traditional deep learning methods are based on the use of convolutional layers to extract features, while recent architectures, such as transformer-based architectures, use self-attention mechanisms to obtain global context. This paper evaluates the performance of representative models from both categories across three tasks: detection, object segmentation, and semantic segmentation. The initial results suggest that transformer-based methods perform better in most evaluation metrics, but at the cost of higher computational resources. Furthermore, recent versions of You Only Look Once (YOLO) have obtained competitive results in terms of mean average precision.
- Published
- 2025
- Full Text
- View/download PDF
40. Synchronizing Object Detection: Applications, Advancements and Existing Challenges
- Author
-
Md. Tanzib Hosain, Asif Zaman, Mushfiqur Rahman Abir, Shanjida Akter, Sawon Mursalin, and Shadman Sakeeb Khan
- Subjects
Object detection ,image recognition ,object segmentation ,semantic detection ,image classification ,object tracking ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
From pivotal roles in autonomous vehicles, healthcare diagnostics, and surveillance systems to seamlessly integrating with augmented reality, object detection algorithms stand as the cornerstone in unraveling the complexities of the visual world. Tracing the trajectory from conventional region-based methods to the latest neural network architectures reveals a technological renaissance where algorithms metamorphose into digital artisans. However, this journey is not without hurdles, prompting researchers to grapple with real-time detection, robustness in varied environments, and interpretability amidst the intricacies of deep learning. The allure of addressing issues such as occlusions, scale variations, and fine-grained categorization propels exploration into uncharted territories, beckoning the scholarly community to contribute to an ongoing saga of innovation and discovery. This research offers a comprehensive panorama, encapsulating the applications reshaping our digital reality, the advancements pushing the boundaries of perception, and the open issues extending an invitation to the next generation of visionaries to explore uncharted frontiers within object detection.
- Published
- 2024
- Full Text
- View/download PDF
41. YOLO-Based Tree Trunk Types Multispectral Perception: A Two-Genus Study at Stand-Level for Forestry Inventory Management Purposes
- Author
-
Daniel Queiros da Silva, Filipe Neves Dos Santos, Vitor Filipe, Armando Jorge Sousa, and E. J. Solteiro Pires
- Subjects
Deep learning ,forest inventory ,multispectral imaging ,object detection ,object segmentation ,tree trunk types ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Stand-level forest tree species perception and identification are needed for monitoring-related operations, being crucial for better biodiversity and inventory management in forested areas. This paper contributes to this knowledge domain by researching tree trunk types multispectral perception at stand-level. YOLOv5 and YOLOv8 - Convolutional Neural Networks specialized at object detection and segmentation - were trained to detect and segment two tree trunk genus (pine and eucalyptus) using datasets collected in a forest region in Portugal. The dataset comprises only two categories, which correspond to the two tree genus. The datasets were manually annotated for object detection and segmentation with RGB and RGB-NIR images, and are publicly available. The “Small” variant of YOLOv8 was the best model at detection and segmentation tasks, achieving an F1 measure above 87% and 62%, respectively. The findings of this study suggest that the use of extended spectra, including Visible and Near Infrared, produces superior results. The trained models can be integrated into forest tractors and robots to monitor forest genus across different spectra. This can assist forest managers in controlling their forest stands.
- Published
- 2024
- Full Text
- View/download PDF
42. Infrared Ship Segmentation Based on Weakly-Supervised and Semi-Supervised Learning
- Author
-
Isa Ali Ibrahim, Abdallah Namoun, Sami Ullah, Hisham Alasmary, Muhammad Waqas, and Iftekhar Ahmad
- Subjects
Infrared ship images ,object segmentation ,weakly-supervised learning ,semi-supervised learning ,pixel-level pseudo-labels ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Existing fully-supervised semantic segmentation methods have achieved good performance. However, they all rely on high-quality pixel-level labels. To minimize the annotation costs, weakly-supervised methods or semi-supervised methods are proposed. When such methods are applied to the infrared ship image segmentation, inaccurate object localization occurs, leading to poor segmentation results. In this paper, we propose an infrared ship segmentation (ISS) method based on weakly-supervised and semi-supervised learning, aiming to improve the performance of ISS by combining the advantages of two learning methods. It uses only image-level labels and a minimal number of pixel-level labels to segment different classes of infrared ships. Our proposed method includes three steps. First, we designed a dual-branch localization network based on ResNet50 to generate ship localization maps. Second, we trained a saliency network with minimal pixel-level labels and many localization maps to obtain ship saliency maps. Then, we optimized the saliency maps with conditional random fields and combined them with image-level labels to generate pixel-level pseudo-labels. Finally, we trained the segmentation network with these pixel-level pseudo-labels to obtain the final segmentation results. Experimental results on the infrared ship dataset collected on real sites indicate that the proposed method achieves 71.18% mean intersection over union, which is at most 56.72% and 8.75% higher than the state-of-the-art weakly-supervised and semi-supervised methods, respectively.
- Published
- 2024
- Full Text
- View/download PDF
43. The Segmentation Tracker With Mask-Guided Background Suppression Strategy
- Author
-
Erlin Tian, Yunpeng Lei, Junfeng Sun, Keyan Zhou, Bin Zhou, and Hanfei Li
- Subjects
Object tracking ,Siamese network ,object segmentation ,background interference ,Electrical engineering. Electronics. Nuclear engineering ,TK1-9971 - Abstract
Segmentation-based tracking is currently a promising tracking paradigm with pixel-wise information. However, the lack of structural constraints makes it difficult to maintain excellent performance in the presence of background interference. Therefore, we propose a Segmentation tracker with mask-guided background suppression strategy. Firstly, a mask-aware module is designed to generate more accurate target masks. With the guidance of regression loss, features were selected that are sensitive only to the target region among shallow features that contain more spatial information. Structural information is introduced and background clutter in the backbone feature is suppressed, which enhances the reliability of the target segmentation. Secondly, a mask-guided template suppression module is constructed to improve feature representation. The generated mask with clear target contours can be used to filter the background noise, which increases the distinction between foreground and background of which. Therefore, the module highlights the target area and improves the interference resistance of the template. Finally, an adaptive spatiotemporal context constraint strategy is proposed to aid the target location. The strategy learns a region probability matrix by the object mask of the previous frame, which is used to constrain the contextual information in the search region of the current frame. Benefiting from this strategy, our method effectively suppresses similar distractors in the search region and achieves robust tracking. Broad experiments on five challenge benchmarks including VOT2016, VOT2018, VOT2019, OTB100, and TC128 indicate that the proposed tracker performs stably under complex tracking backgrounds.
- Published
- 2024
- Full Text
- View/download PDF
44. Research and Application of YOLOv11-Based Object Segmentation in Intelligent Recognition at Construction Sites
- Author
-
Luhao He, Yongzhang Zhou, Lei Liu, and Jianhua Ma
- Subjects
YOLOv11-Seg ,object segmentation ,multi-object detection ,intelligent construction site ,Building construction ,TH1-9745 - Abstract
With the increasing complexity of construction site environments, robust object detection and segmentation technologies are essential for enhancing intelligent monitoring and ensuring safety. This study investigates the application of YOLOv11-Seg, an advanced target segmentation technology, for intelligent recognition on construction sites. The research focuses on improving the detection and segmentation of 13 object categories, including excavators, bulldozers, cranes, workers, and other equipment. The methodology involves preparing a high-quality dataset through cleaning, annotation, and augmentation, followed by training the YOLOv11-Seg model over 351 epochs. The loss function analysis indicates stable convergence, demonstrating the model’s effective learning capabilities. The evaluation results show an mAP@0.5 average of 0.808, F1 Score(B) of 0.8212, and F1 Score(M) of 0.8382, with 81.56% of test samples achieving confidence scores above 90%. The model performs effectively in static scenarios, such as equipment detection in Xiong’an New District, and dynamic scenarios, including real-time monitoring of workers and vehicles, maintaining stable performance even at 1080P resolution. Furthermore, it demonstrates robustness under challenging conditions, including nighttime, non-construction scenes, and incomplete images. The study concludes that YOLOv11-Seg exhibits strong generalization capability and practical utility, providing a reliable foundation for enhancing safety and intelligent monitoring at construction sites. Future work may integrate edge computing and UAV technologies to support the digital transformation of construction management.
- Published
- 2024
- Full Text
- View/download PDF
45. Comparative Analysis of Nucleus Segmentation Techniques for Enhanced DNA Quantification in Propidium Iodide-Stained Samples
- Author
-
Viktor Zoltán Jónás, Róbert Paulik, Béla Molnár, and Miklós Kozlovszky
- Subjects
digital pathology ,cytometry ,image analysis ,object segmentation ,fluorescence ,ploidy ,Technology ,Engineering (General). Civil engineering (General) ,TA1-2040 ,Biology (General) ,QH301-705.5 ,Physics ,QC1-999 ,Chemistry ,QD1-999 - Abstract
Digitization in pathology and cytology labs is now widespread, a significant shift from a decade ago when few doctors used image processing tools. Despite unchanged scanning times due to excitation in fluorescent imaging, advancements in computing power and software have enabled more complex algorithms, yielding better-quality results. This study evaluates three nucleus segmentation algorithms for ploidy analysis using propidium iodide-stained digital WSI slides. Our goal was to improve segmentation accuracy to more closely match DNA histograms obtained via flow cytometry, with the ultimate aim of enhancing the calibration method we proposed in a previous study, which seeks to align image cytometry results with those from flow cytometry. We assessed these algorithms based on raw segmentation performance and DNA histogram similarity, using confusion-matrix-based metrics. Results indicate that modern algorithms perform better, with F1 scores exceeding 0.845, compared to our earlier solution’s 0.807, and produce DNA histograms that more closely resemble those from the reference FCM method.
- Published
- 2024
- Full Text
- View/download PDF
46. A Novel Multi-Data-Augmentation and Multi-Deep-Learning Framework for Counting Small Vehicles and Crowds.
- Author
-
Tsai, Chun-Ming and Shih, Frank Y.
- Subjects
- *
TRAFFIC monitoring , *DRONE aircraft , *DATA augmentation , *CROWDS , *COUNTING , *TELECOMMUNICATION systems - Abstract
Counting small pixel-sized vehicles and crowds in unmanned aerial vehicles (UAV) images is crucial across diverse fields, including geographic information collection, traffic monitoring, item delivery, communication network relay stations, as well as target segmentation, detection, and tracking. This task poses significant challenges due to factors such as varying view angles, non-fixed drone cameras, small object sizes, changing illumination, object occlusion, and image jitter. In this paper, we introduce a novel multi-data-augmentation and multi-deep-learning framework designed for counting small vehicles and crowds in UAV images. The framework harnesses the strengths of specific deep-learning detection models, coupled with the convolutional block attention module and data augmentation techniques. Additionally, we present a new method for detecting cars, motorcycles, and persons with small pixel sizes. Our proposed method undergoes evaluation on the test dataset v2 of the 2022 AI Cup competition, where we secured the first place on the private leaderboard by achieving the highest harmonic mean. Subsequent experimental results demonstrate that our framework outperforms the existing YOLOv7-E6E model. We also conducted comparative experiments using the publicly available VisDrone datasets, and the results show that our model outperforms the other models with the highest AP50 score of 52%. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Siamese refine polar mask prediction network for visual tracking.
- Author
-
Pu, Bin, Xiang, Ke, Liu, Ze'an, and Wang, Xuanyin
- Abstract
Visual tracking is a classical research problem and recently tracking with mask prediction has been a popular task in tracking research. Many trackers add a pixel-wise segmentation subnetwork behind the original bounding box tracker to get the target's mask. These two-stage methods need to crop the target region after finding its location and extract deep features for segmentation redundantly. This paper proposes an anchor-free Siamese Refine Polar Mask (SiamRPM) prediction network for visual tracking, which can obtain the target's mask directly. Similar to bounding box regression, we use polar mask regression to get the target's convex hull mask. To further adjust the contour points, we propose to employ a cascaded refinement module. The mask contours are iteratively shifted using the offset outputs of the refinement module. Comprehensive experiments on visual tracking benchmark datasets illustrate that our SiamRPM can achieve competitive results with a real-time running speed. Our method provides an effective contour-based pipeline for the tracking and segmentation task. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. A Review of Machine Learning and Deep Learning for Object Detection, Semantic Segmentation, and Human Action Recognition in Machine and Robotic Vision.
- Author
-
Manakitsa, Nikoleta, Maraslidis, George S., Moysis, Lazaros, and Fragulis, George F.
- Subjects
COMPUTER vision ,OBJECT recognition (Computer vision) ,DEEP learning ,HUMAN activity recognition ,MACHINE learning ,IMAGE recognition (Computer vision) - Abstract
Machine vision, an interdisciplinary field that aims to replicate human visual perception in computers, has experienced rapid progress and significant contributions. This paper traces the origins of machine vision, from early image processing algorithms to its convergence with computer science, mathematics, and robotics, resulting in a distinct branch of artificial intelligence. The integration of machine learning techniques, particularly deep learning, has driven its growth and adoption in everyday devices. This study focuses on the objectives of computer vision systems: replicating human visual capabilities including recognition, comprehension, and interpretation. Notably, image classification, object detection, and image segmentation are crucial tasks requiring robust mathematical foundations. Despite the advancements, challenges persist, such as clarifying terminology related to artificial intelligence, machine learning, and deep learning. Precise definitions and interpretations are vital for establishing a solid research foundation. The evolution of machine vision reflects an ambitious journey to emulate human visual perception. Interdisciplinary collaboration and the integration of deep learning techniques have propelled remarkable advancements in emulating human behavior and perception. Through this research, the field of machine vision continues to shape the future of computer systems and artificial intelligence applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. Technical note: ShinyAnimalCV: open-source cloud-based web application for object detection, segmentation, and three-dimensional visualization of animals using computer vision.
- Author
-
Wang, Jin, Hu, Yu, Xiang, Lirong, Morota, Gota, Brooks, Samantha A, Wickens, Carissa L, Miller-Cushon, Emily K, and Yu, Haipeng
- Subjects
- *
SCIENCE education , *OBJECT recognition (Computer vision) , *THREE-dimensional imaging , *MACHINE learning , *ANIMAL communities , *DEEP learning , *COMPUTER vision - Abstract
Computer vision (CV), a non-intrusive and cost-effective technology, has furthered the development of precision livestock farming by enabling optimized decision-making through timely and individualized animal care. The availability of affordable two- and three-dimensional camera sensors, combined with various machine learning and deep learning algorithms, has provided a valuable opportunity to improve livestock production systems. However, despite the availability of various CV tools in the public domain, applying these tools to animal data can be challenging, often requiring users to have programming and data analysis skills, as well as access to computing resources. Moreover, the rapid expansion of precision livestock farming is creating a growing need to educate and train animal science students in CV. This presents educators with the challenge of efficiently demonstrating the complex algorithms involved in CV. Thus, the objective of this study was to develop ShinyAnimalCV, an open-source cloud-based web application designed to facilitate CV teaching in animal science. This application provides a user-friendly interface for performing CV tasks, including object segmentation, detection, three-dimensional surface visualization, and extraction of two- and three-dimensional morphological features. Nine pre-trained CV models using top-view animal data are included in the application. ShinyAnimalCV has been deployed online using cloud computing platforms. The source code of ShinyAnimalCV is available on GitHub, along with detailed documentation on training CV models using custom data and deploying ShinyAnimalCV locally to allow users to fully leverage the capabilities of the application. ShinyAnimalCV can help to support the teaching of CV, thereby laying the groundwork to promote the adoption of CV in the animal science community. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. Edge-assisted Object Segmentation Using Multimodal Feature Aggregation and Learning.
- Author
-
JIANBO LI, GENJI YUAN, and ZHENG YANG
- Subjects
IMAGE fusion - Abstract
Object segmentation aims to perfectly identify objects embedded in the surrounding environment and has a wide range of applications. Most previous methods of object segmentation only use RGB images and ignore geometric information from disparity images. Making full use of heterogeneous data from different devices has proved to be a very effective strategy for improving segmentation performance. The key challenge of the multimodal fusion-based object segmentation task lies in the learning, transformation, and fusion of multimodal information. In this article, we focus on the transformation of disparity images and the fusion of multimodal features. We develop a multimodal fusion object segmentation framework, termed the Hybrid Fusion Segmentation Network (HFSNet). Specifically, HFSNet contains three key components, i.e., disparity convolutional sparse coding (DCSC), asymmetric dense projection feature aggregation (ADPFA), and multimodal feature fusion (MFF). The DCSC is designed based on convolutional sparse coding. It not only has better interpretability but also preserves the key geometric information of the object. ADPFA is designed to enhance texture and geometric information to fully exploit nonadjacent features. MFF is used to perform multimodal feature fusion. Extensive experiments show that our HFSNet outperforms existing state-of-the-art models on two challenging datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.