1,481 results on '"image classification"'
Search Results
2. YOLO‐RSFM: An efficient road small object detection method.
- Author
-
Tang, Pei, Ding, Zhenyu, Lv, Mao, Jiang, Minnan, and Xu, Weikai
- Subjects
- *
IMAGE recognition (Computer vision) , *IMAGE registration , *TRANSFORMER models , *ALGORITHMS , *SPINE - Abstract
To tackle challenges in road multi‐object detection, such as object occlusion, small object detection, and multi‐scale object detection difficulties, a new YOLOv8n‐RSFM structure is proposed. The key improvement of this structure lies in the introduction of the transformer decoder head, which optimizes the matching between the ground truth and predicted boxes, thereby effectively addressing issues of object overlap and multi‐scale detection. Additionally, a small object detection layer is incorporated to retain crucial information beneficial for detecting small objects, significantly improving the detection accuracy for small targets. To enhance learning capacity and reduce redundant computations, the FasterNet backbone is employed to replace CSPDarknet53, thus accelerating the training process. Finally, the INNER‐MPDIoU loss function is introduced to replace the original algorithm's complete IoU to accelerate the convergence and obtain more accurate regression results. A series of experiments were conducted on different datasets. The experimental results show that the proposed model YOLOv8N‐RSFM outperforms the original model YOLOv8n in small target detection. On the VisDrone, TinyPerson, and VSCrowd datasets, the mean accuracy percentage improved by 7.9%, 12.3%, and 4.5%, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. MultiScale spectral–spatial convolutional transformer for hyperspectral image classification.
- Author
-
Gong, Zhiqiang, Zhou, Xian, and Yao, Wen
- Subjects
- *
IMAGE recognition (Computer vision) , *TRANSFORMER models , *FEATURE extraction , *CONVOLUTIONAL neural networks , *PIXELS , *MULTISPECTRAL imaging , *SPECTRAL imaging - Abstract
Due to the powerful ability in capturing the global information, transformer has become an alternative architecture of CNNs for hyperspectral image classification. However, general transformer mainly considers the global spectral information while ignores the multiscale spatial information of the hyperspectral image. In this paper, we propose a multiscale spectral–spatial convolutional transformer (MultiFormer) for hyperspectral image classification. First, the developed method utilizes multiscale spatial patches as tokens to formulate the spatial transformer and generates multiscale spatial representation of each band in each pixel. Second, the spatial representation of all the bands in a given pixel are utilized as tokens to formulate the spectral transformer and generate the multiscale spectral–spatial representation of each pixel. Besides, a modified spectral–spatial CAF module is constructed in the MultiFormer to fuse cross‐layer spectral and spatial information. Therefore, the proposed MultiFormer can capture the multiscale spectral–spatial information and provide better performance than most of other architectures for hyperspectral image classification. Experiments are conducted over commonly used real‐world datasets and the comparison results show the superiority of the proposed method. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Advances in medical image analysis: A comprehensive survey of lung infection detection.
- Author
-
Kordnoori, Shirin, Sabeti, Maliheh, Mostafaei, Hamidreza, and Seyed Agha Banihashemi, Saeed
- Subjects
- *
IMAGE recognition (Computer vision) , *IMAGE analysis , *LUNG infections , *IMAGE processing , *IMAGE segmentation - Abstract
This research investigates advanced approaches in medical image analysis, specifically focusing on segmentation and classification techniques, as well as their integration into multi‐task architectures for lung infections. This research begins by explaining key architectural models used in segmentation and classification tasks. The study extends to the enhancement of these architectures through attention modules and conditional random fields. Relevant datasets and evaluation metrics, incorporating discussions on loss functions are also reviewed. This review encompasses recent advancements in single‐task and multi‐task models, highlighting innovations in semi‐supervised, self‐supervised, few‐shot, and zero‐shot learning techniques. Empirical analysis is conducted on both single‐task and multi‐task architectures, predominantly utilizing the U‐Net framework, and is applied across multiple datasets for segmentation and classification tasks. Results demonstrate the effectiveness of these models and provide insights into the strengths and limitations of different approaches. This research contributes to improved detection and diagnosis of lung infections by offering a comprehensive overview of current methodologies and their practical applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Recognition of vehicle license plates in highway scenes with deep fusion network and connectionist temporal classification.
- Author
-
Hua, Liru, Ma, Xinyi, Zhao, Chihang, Zhang, Bailing, Su, Zijun, and Wu, Yuhang
- Subjects
- *
IMAGE recognition (Computer vision) , *AUTOMOBILE license plates , *PATTERN recognition systems , *ARTIFICIAL neural networks , *INTELLIGENT transportation systems , *RECURRENT neural networks - Abstract
License plate recognition is crucial in Intelligent Transportation Systems (ITS) for vehicle management, traffic monitoring, and security inspection. In highway scenarios, this task faces challenges such as diversity, blurriness, occlusion, and illumination variation of license plates. This article explores Recurrent Neural Networks based on Connectionist Temporal Classification (RNN‐CTC) for license plate recognition in challenging highway conditions. Four neural network models: ResNet50, ResNeXt, InceptionV3, and SENet, all combined with RNN‐CTC are comparatively evaluated. Furthermore, a novel architecture named ResNet50 Deep Fusion Network using Connectionist Temporal Classification (ResNet50‐DFN‐CTC) is proposed. Comparative and ablation experiments are conducted using the Highway License Plate Dataset of Southeast University (HLPD‐SU). Results demonstrate the superior performance of ResNet50‐DFN‐CTC in challenging highway conditions, achieving 93.158% accuracy with a processing time of 7.91 ms, outperforming other tested models. This research contributes to advancing license plate recognition technology for real‐world highway applications under adverse conditions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Skin cancer identification utilizing deep learning: A survey.
- Author
-
Meedeniya, Dulani, De Silva, Senuri, Gamage, Lahiru, and Isuranga, Uditha
- Subjects
- *
IMAGE recognition (Computer vision) , *TRANSFORMER models , *CONVOLUTIONAL neural networks , *SKIN cancer , *SKIN imaging - Abstract
Melanoma, a highly prevalent and lethal form of skin cancer, has a significant impact globally. The chances of recovery for melanoma patients substantially improve with early detection. Currently, deep learning (DL) methods are gaining popularity in assisting with the early identification of melanoma. Despite their high performance, relying solely on an image classifier undermines the credibility of the application and makes it difficult to understand the rationale behind the model's predictions highlighting the need for Explainable AI (XAI). This study provides a survey on skin cancer identification using DL techniques utilized in studies from 2017 to 2024. Compared to existing survey studies, the authors address the latest related studies covering several public skin cancer image datasets and focusing on segmentation, classification based on convolutional neural networks and vision transformers, and explainability. The analysis and the comparisons of the existing studies will be beneficial for the researchers and developers in this area, to identify the suitable techniques to be used for automated skin cancer image classification. Thereby, the survey findings can be used to implement support applications advancing the skin cancer diagnosis process. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Glaucoma detection with explainable AI using convolutional neural networks based feature extraction and machine learning classifiers.
- Author
-
Velpula, Vijaya Kumar, Sharma, Diksha, Sharma, Lakhan Dev, Roy, Amarjit, Bhuyan, Manas Kamal, Alfarhood, Sultan, and Safran, Mejdl
- Subjects
- *
IMAGE recognition (Computer vision) , *CONVOLUTIONAL neural networks , *ARTIFICIAL intelligence , *FEATURE extraction , *VISION disorders - Abstract
Glaucoma is an eye disease that damages the optic nerve as a result of vision loss, it is the leading cause of blindness worldwide. Due to the time‐consuming, inaccurate, and manual nature of traditional methods, automation in glaucoma detection is important. This paper proposes an explainable artificial intelligence (XAI) based model for automatic glaucoma detection using pre‐trained convolutional neural networks (PCNNs) and machine learning classifiers (MLCs). PCNNs are used as feature extractors to obtain deep features that can capture the important visual patterns and characteristics from fundus images. Using extracted features MLCs then classify glaucoma and healthy images. An empirical selection of the CNN and MLC parameters has been made in the performance evaluation. In this work, a total of 1,865 healthy and 1,590 glaucoma images from different fundus datasets were used. The results on the ACRIMA dataset show an accuracy, precision, and recall of 98.03%, 97.61%, and 99%, respectively. Explainable artificial intelligence aims to create a model to increase the user's trust in the model's decision‐making process in a transparent and interpretable manner. An assessment of image misclassification has been carried out to facilitate future investigations. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Surface texture image classification of carbon/phenolic composites in extreme environments using deep learning.
- Author
-
Shang, Tong, Yang, Jing, Ge, Jingran, Ji, Sudong, Li, Maoyuan, and Liang, Jun
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *SURFACE texture , *ELECTRIC arc , *FEATURE extraction , *DEEP learning - Abstract
The classification of ablation images holds significant practical value in thermal protection structures, as it enables the assessment of heat and corrosion resistance of composites. This paper proposes an image‐based deep learning framework to identify the surface texture of carbon/phenolic composites ablative images. First, ablation experiments and collection of surface texture images of carbon/phenolic composites under different thermal environments were conducted in an electric arc wind tunnel. Then, a deep learning model based on a convolutional neural network (CNN) is developed for ablative image classification. The pre‐trained network is ultimately employed as the input for transfer learning. The network's feature extraction layer is trained using the ImageNet dataset, while the global average pooling addresses specific classification tasks. The test results demonstrate that the proposed method effectively classifies the relatively small surface texture dataset, enhances the classification performance of ablative surface texture with an accuracy of up to 97.6%, and exhibits robustness and generalization capabilities. Highlights: The paper proposes a new deep learning classification method for ablative images.A model highly sensitive to small and weak features is built.Transfer learning and data enhancement techniques are introduced into classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. Recent Progress in Deep Learning for Improving Coherent Anti‐Stokes Raman Scattering Microscopy.
- Author
-
Yao, Bowen, Lin, Fangrui, Luo, Ziyi, Chen, Qinglin, Lin, Danying, Yang, Zhigang, Li, Jia, and Qu, Junle
- Subjects
- *
IMAGE recognition (Computer vision) , *IMAGE denoising , *RAMAN microscopy , *RAMAN scattering , *DEEP learning , *DATA analysis - Abstract
Coherent anti‐Stokes Raman scattering (CARS) microscopy is a powerful label‐free imaging technique that leverages biomolecular vibrations and is widely used in different fields. However, its intrinsic non‐resonant background (NRB) can distort Raman signals and compromise spectral fidelity. Conventional data analysis methods for CARS encounter a bottleneck in achieving high accuracy. Furthermore, CARS requires balancing imaging speed against image quality. In recent years, endeavors in deep learning have effectively overcome these obstacles, advancing the development of CARS. This review highlights the research that applies deep learning to mitigate NRB, classify CARS data for disease identification, and denoise images. Each approach is delineated in terms of network architecture, training data, and loss functions. Finally, the challenges in this field is discussed and using the latest deep learning advancement is suggested to enhance the reliability and efficiency of CARS microscopy. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. Seeing is believing: Forecasting crude oil price trend from the perspective of images.
- Author
-
Ren, Xiaohang, Jiang, Wenting, Ji, Qiang, and Zhai, Pengxiang
- Subjects
CONVOLUTIONAL neural networks ,IMAGE recognition (Computer vision) ,PETROLEUM sales & prices ,ENERGY futures ,ENERGY industries - Abstract
In this paper, we propose a novel imaging method to forecast the daily price data of West Texas Intermediate (WTI) crude oil futures. We use convolutional neural networks (CNNs) for future price trend prediction and obtain higher prediction accuracy than other benchmark forecasting methods. The results show that images can contain more nonlinear information, which is beneficial for energy price forecasting. Nonlinear factors also have a strong influence during drastic fluctuations in crude oil prices. In the robustness tests, we find that the image‐based CNN is the most stable approach and can be applied in various futures forecasting scenarios. In the prediction of low‐frequency models for high‐frequency data, the CNN method still retains considerable predictive power, indicating the possibility of transfer learning of our novel approach. By unleashing the power of the picture, we open up a whole new perspective for forecasting future energy trends. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. A novel density‐based representation for point cloud and its ability to facilitate classification.
- Author
-
Xie, Xianlin and Tang, Xue‐song
- Subjects
- *
IMAGE recognition (Computer vision) , *FEATURE extraction , *IMAGE processing , *POINT cloud , *POINT processes - Abstract
Currently, in the field of processing 3D point cloud data, two primary representation methods have emerged: point‐based methods and voxel‐based methods. However, the former suffer from significant computational costs and lack the ease of handling exhibited by voxel‐based methods. Conversely, the later often encounter challenges related to information loss resulting from downsampling operations, thereby impeding subsequent tasks. To address these limitations, this article introduces a novel density‐based representation method for voxel partitioning. Additionally, a corresponding network structure is devised to extract features from this specific density representation, thereby facilitating the successful completion of classification tasks. The experiments are implemented on ModelNet40 and MNIST demonstrate that the proposed 3D convolution can achieve the‐state‐of‐the‐art performance based on the voxels. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. A study on a target detection model for autonomous driving tasks.
- Author
-
Chen, Hao, Min, Byung‐Won, and Zhang, Haifei
- Subjects
- *
IMAGE recognition (Computer vision) , *ARTIFICIAL intelligence , *IMAGE processing , *AUTONOMOUS vehicles , *SPEED - Abstract
Target detection in autonomous driving tasks presents a complex and critical challenge due to the diversity of targets and the intricacy of the environment. To address this issue, this paper proposes an enhanced YOLOv8 model. Firstly, the original large target detection head is removed and replaced with a detection head tailored for small targets and high‐level semantic details. Secondly, an adaptive feature fusion method is proposed, where input feature maps are processed using dilated convolutions with different dilation rates, followed by adaptive feature fusion to generate adaptive weights. Finally, an improved attention mechanism is incorporated to enhance the model's focus on target regions. Additionally, the impact of Group Shuffle Convolution (GSConv) on the model's detection speed is investigated. Validated on two public datasets, the model achieves a mean Average Precision (mAP) of 53.7% and 53.5%. Although introducing GSConv results in a slight decrease in mAP, it significantly improves frames per second. These findings underscore the effectiveness of the proposed model in autonomous driving tasks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. SRL‐ProtoNet: Self‐supervised representation learning for few‐shot remote sensing scene classification.
- Author
-
Liu, Bing, Zhao, Hongwei, Li, Jiao, Gao, Yansheng, and Zhang, Jianrong
- Subjects
- *
IMAGE recognition (Computer vision) , *REMOTE sensing , *DISTANCE education , *CLASSIFICATION , *PROTOTYPES , *DEEP learning - Abstract
Using a deep learning method to classify a large amount of labelled remote sensing scene data produces good performance. However, it is challenging for deep learning based methods to generalise to classification tasks with limited data. Few‐shot learning allows neural networks to classify unseen categories when confronted with a handful of labelled data. Currently, episodic tasks based on meta‐learning can effectively complete few‐shot classification, and training an encoder that can conduct representation learning has become an important component of few‐shot learning. An end‐to‐end few‐shot remote sensing scene classification model based on ProtoNet and self‐supervised learning is proposed. The authors design the Pre‐prototype for a more discrete feature space and better integration with self‐supervised learning, and also propose the ProtoMixer for higher quality prototypes with a global receptive field. The authors' method outperforms the existing state‐of‐the‐art self‐supervised based methods on three widely used benchmark datasets: UC‐Merced, NWPU‐RESISC45, and AID. Compare with previous state‐of‐the‐art performance. For the one‐shot setting, this method improves by 1.21%, 2.36%, and 0.84% in AID, UC‐Merced, and NWPU‐RESISC45, respectively. For the five‐shot setting, this method surpasses by 0.85%, 2.79%, and 0.74% in the AID, UC‐Merced, and NWPU‐RESISC45, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Centre‐loss—A preferred class verification approach over sample‐to‐sample in self‐checkout products datasets.
- Author
-
Ciapas, Bernardas and Treigys, Povilas
- Subjects
- *
IMAGE recognition (Computer vision) , *ARTIFICIAL neural networks , *IMAGE registration , *EUCLIDEAN distance , *MULTIPLE comparisons (Statistics) - Abstract
Siamese networks excel at comparing two images, serving as an effective class verification technique for a single‐per‐class reference image. However, when multiple reference images are present, Siamese verification necessitates multiple comparisons and aggregation, often unpractical at inference. The Centre‐Loss approach, proposed in this research, solves a class verification task more efficiently, using a single forward‐pass during inference, than sample‐to‐sample approaches. Optimising a Centre‐Loss function learns class centres and minimises intra‐class distances in latent space. The authors compared verification accuracy using Centre‐Loss against aggregated Siamese when other hyperparameters (such as neural network backbone and distance type) are the same. Experiments were performed to contrast the ubiquitous Euclidean against other distance types to discover the optimum Centre‐Loss layer, its size, and Centre‐Loss weight. In optimal architecture, the Centre‐Loss layer is connected to the penultimate layer, calculates Euclidean distance, and its size depends on distance type. The Centre‐Loss method was validated on the Self‐Checkout products and Fruits 360 image datasets. Centre‐Loss comparable accuracy and lesser complexity make it a preferred approach over sample‐to‐sample for the class verification task, when the number of reference image per class is high and inference speed is a factor, such as in self‐checkouts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
15. CHiMP: deep‐learning tools trained on protein crystallization micrographs to enable automation of experiments.
- Author
-
King, Oliver N. F., Levik, Karl E., Sandy, James, and Basham, Mark
- Subjects
- *
OBJECT recognition (Computer vision) , *IMAGE recognition (Computer vision) , *LIGHT sources , *DEEP learning , *PROTEIN analysis - Abstract
A group of three deep‐learning tools, referred to collectively as CHiMP (Crystal Hits in My Plate), were created for analysis of micrographs of protein crystallization experiments at the Diamond Light Source (DLS) synchrotron, UK. The first tool, a classification network, assigns images into categories relating to experimental outcomes. The other two tools are networks that perform both object detection and instance segmentation, resulting in masks of individual crystals in the first case and masks of crystallization droplets in addition to crystals in the second case, allowing the positions and sizes of these entities to be recorded. The creation of these tools used transfer learning, where weights from a pre‐trained deep‐learning network were used as a starting point and repurposed by further training on a relatively small set of data. Two of the tools are now integrated at the VMXi macromolecular crystallography beamline at DLS, where they have the potential to absolve the need for any user input, both for monitoring crystallization experiments and for triggering in situ data collections. The third is being integrated into the XChem fragment‐based drug‐discovery screening platform, also at DLS, to allow the automatic targeting of acoustic compound dispensing into crystallization droplets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. Deep Learning‐Based Image Classification and Segmentation on Digital Histopathology for Oral Squamous Cell Carcinoma: A Systematic Review and Meta‐Analysis.
- Author
-
Pirayesh, Zeynab, Mohammad‐Rahimi, Hossein, Ghasemi, Nikoo, Motamedian, Saeed‐Reza, Sadeghi, Terme Sarrafan, Koohi, Hediye, Rokhshad, Rata, Lotfi, Shima Moradian, Najafi, Anahita, Alajaji, Shahd A., Khoury, Zaid H., Jessri, Maryam, and Sultan, Ahmed S.
- Subjects
- *
IMAGE recognition (Computer vision) , *IMAGE segmentation , *ARTIFICIAL intelligence , *IMAGE analysis , *SQUAMOUS cell carcinoma - Abstract
Background: Artificial intelligence (AI)‐based tools have shown promise in histopathology image analysis in improving the accuracy of oral squamous cell carcinoma (OSCC) detection with intent to reduce human error. Objectives: This systematic review and meta‐analysis evaluated deep learning (DL) models for OSCC detection on histopathology images by assessing common diagnostic performance evaluation metrics for AI‐based medical image analysis studies. Methods: Diagnostic accuracy studies that used DL models for the analysis of histopathological images of OSCC compared to the reference standard were analyzed. Six databases (PubMed, Google Scholar, Scopus, Embase, ArXiv, and IEEE) were screened for publications without any time limitation. The QUADAS‐2 tool was utilized to assess quality. The meta‐analyses included only studies that reported true positives (TP), true negatives (TN), false positives (FP), and false negatives (FN) in their test sets. Results: Of 1267 screened studies, 17 studies met the final inclusion criteria. DL methods such as image classification (n = 11) and segmentation (n = 3) were used, and some studies used combined methods (n = 3). On QUADAS‐2 assessment, only three studies had a low risk of bias across all applicability domains. For segmentation studies, 0.97 was reported for accuracy, 0.97 for sensitivity, 0.98 for specificity, and 0.92 for Dice. For classification studies, accuracy was reported as 0.99, sensitivity 0.99, specificity 1.0, Dice 0.95, F1 score 0.98, and AUC 0.99. Meta‐analysis showed pooled estimates of 0.98 sensitivity and 0.93 specificity. Conclusion: Application of AI‐based classification and segmentation methods on image analysis represents a fundamental shift in digital pathology. DL approaches demonstrated significantly high accuracy for OSCC detection on histopathology, comparable to that of human experts in some studies. Although AI‐based models cannot replace a well‐trained pathologist, they can assist through improving the objectivity and repeatability of the diagnosis while reducing variability and human error as a consequence of pathologist burnout. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Investigation of the prediction of wildlife animals and its deployment using the robot.
- Author
-
Kaur, Parminder, Kansal, Sachin, and Singh, Varinder P.
- Abstract
Monitoring Wildlife in their natural habitat requires direct human intervention. Some animals are scared of humans. In such situations, camera‐equipped devices are implemented to gain a clear picture of Wildlife. Objective: Current wildlife detection models detect and classify the animal from camera‐captured images, limiting the action to rescue or save them from mishaps. Also, the camera‐equipped devices are fixed at particular locations. Therefore, an efficient detection model capable of protecting the animal has potential to play an important role. Method: To this end, we present Pred‐WAR, a Convolution Neural Network (CNN)‐based image classification approach to detect and raise rescue alerts for real‐time Wildlife. In our approach, we have proposed a Mask Region‐based CNN (Mask RCNN or MRCNN) with an Automatic Mixed Precision model that is implemented on a Robot Operating System‐based mobile robot with Raspberry Pi4 to detect and raise acoustic of Lion or alarm to alert or rescue animal in real‐time. Results: Pred‐WAR obtained a mean Average Precision value of 85.47% and an F1 score of 87.73% with a precision value range between 92% to 99%, outperforming the current MRCNN model. Significance: This approach has fast computation speed and maintains accuracy that will be efficiently implemented in real‐time scenarios. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Causal inference for out‐of‐distribution recognition via sample balancing.
- Author
-
Wang, Yuqing, Li, Xiangxian, Liu, Yannan, Cao, Xiao, Meng, Xiangxu, and Meng, Lei
- Subjects
CAUSAL artificial intelligence ,IMAGE recognition (Computer vision) ,CAUSAL inference ,COMPUTER vision ,CLASSIFICATION algorithms - Abstract
Image classification algorithms are commonly based on the Independent and Identically Distribution (i.i.d.) assumption, but in practice, the Out‐Of‐Distribution (OOD) problem widely exists, that is, the contexts of images in the model predicting are usually unseen during training. In this case, existing models trained under the i.i.d. assumption are limiting generalisation. Causal inference is an important method to learn the causal associations which are invariant across different environments, thus improving the generalisation ability of the model. However, existing methods usually require partitioning of the environment to learn invariant features, which mostly have imbalance problems due to the lack of constraints. In this paper, we propose a balanced causal learning framework (BCL), starting from how to divide the dataset in a balanced way and the balance of training after the division, which automatically generates fine‐grained balanced data partitions in an unsupervised manner and balances the training difficulty of different classes, thereby enhancing the generalisation ability of models in different environments. Experiments on the OOD datasets NICO and NICO++ demonstrate that BCL achieves stable predictions on OOD data, and we also find that models using BCL focus more accurately on the foreground of images compared with the existing causal inference method, which effectively improves the generalisation ability. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Maritime vessel classification based on a dual network combining EfficientNet with a hybrid network MPANet.
- Author
-
Liu, Wenhui, Qiao, Yulong, Zhao, Yue, Xing, Zhengyi, and He, Hengxiang
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *INFRARED imaging , *COMPUTER vision , *MARITIME management - Abstract
Ship classification is an important technique for enhancing maritime management and security. Visible and infrared sensors are generally employed to deal with the challenging problem and improve classification performance. Herein, a two‐branch feature fusion neural network structure is proposed to classify the visible and infrared maritime vessel images simultaneously. Specifically, in this two‐branch neural network, one branch is based on a deep convolutional neural network that is used to extract the visible image features, while the other is a hybrid network structure that is a multi‐scale patch embedding network called MPANet. The sub‐network MPANet can extract fine‐ and coarse‐grained features, in which the pooling operation instead of the multi‐head attention mechanism is utilized to reduce memory consumption. When there are infrared images, it is used to extract the infrared image features, otherwise, this branch is also utilized to extract visible image features. Therefore, this dual network is suitable with or without infrared images. The experimental results on the visible and infrared spectrums (VAIS) dataset demonstrate that the introduced network achieves state‐of‐the‐art ship classification performance on visible images and paired visible and infrared ship images. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. Eigenspectrum regularisation reverse neighbourhood discriminative learning.
- Author
-
Xie, Ming, Tan, Hengliang, Du, Jiao, Yang, Shuo, Yan, Guofeng, Li, Wangwang, and Feng, Jianwei
- Abstract
Linear discriminant analysis is a classical method for solving problems of dimensional reduction and pattern classification. Although it has been extensively developed, however, it still suffers from various common problems, such as the Small Sample Size (SSS) and the multimodal problem. Neighbourhood linear discriminant analysis (nLDA) was recently proposed to solve the problem of multimodal class caused by the contravention of independently and identically distributed samples. However, due to the existence of many small‐scale practical applications, nLDA still has to face the SSS problem, which leads to instability and poor generalisation caused by the singularity of the within‐neighbourhood scatter matrix. The authors exploit the eigenspectrum regularisation techniques to circumvent the singularity of the within‐neighbourhood scatter matrix of nLDA, which is called Eigenspectrum Regularisation Reverse Neighbourhood Discriminative Learning (ERRNDL). The algorithm of nLDA is reformulated as a framework by searching two projection matrices. Three eigenspectrum regularisation models are introduced to our framework to evaluate the performance. Experiments are conducted on the University of California, Irvine machine learning repository and six image classification datasets. The proposed ERRNDL‐based methods achieve considerable performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Image Analysis Method of Substation Equipment Status Based on Cross‐Modal Learning.
- Author
-
Li, Zhuyun, Yoshie, Osamu, Wu, Hao, Mai, Xiaoming, Yang, Yingyi, and Qu, Xian
- Subjects
- *
IMAGE recognition (Computer vision) , *VIDEO surveillance , *IMAGE analysis , *CLASSIFICATION algorithms , *POWER resources - Abstract
In response to increasing power supply needs, maintaining stable substations is vital for reliable electricity. Traditional manual equipment inspections in these substations are inefficient and risky, often leading to hazards and delayed detection of faults. Therefore, there's a growing shift towards using intelligent image recognition technology in video surveillance systems for safer and more efficient inspections. This paper focuses on enhancing the level of intelligent inspection in substations using artificial intelligence‐based visual recognition technology. It introduces a novel small‐sample classification algorithm based on the CLIP architecture. This method uses cross‐modal equipment status information as additional training samples, optimizing the loss function together with image samples, and devises hand‐crafted strategies for text sample inputs to distinguish between equipment and states. The experimental results show that with only 16 training samples per category for 21 types of electrical equipment states, our method achieved a maximum accuracy of 93.38%. This represents a 2.98% higher accuracy than the PPLCNet trained on the full dataset and an 8.63% higher accuracy than the PPLCNet trained with an equal number of samples, with significantly reduced training time. © 2024 Institute of Electrical Engineers of Japan and Wiley Periodicals LLC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. PAW: Prediction of wildlife animals using a robot under adverse weather conditions.
- Author
-
Kaur, Parminder, Kansal, Sachin, and Singh, V. P.
- Subjects
OBJECT recognition (Computer vision) ,IMAGE recognition (Computer vision) ,AGGREGATION (Robotics) ,ARTIFICIAL intelligence ,MOBILE robots - Abstract
Image dehazing and object detection are two different research areas that play a vital role in machine learning. When merged together and implemented in real‐time, it is a boon in the field of artificial intelligence, specifically robotics. Object detection and tracking are two of the major implementations in almost the entire robot's training and learning. The learning of the robot depends on the images; these images can be camera‐captured images or a pretrained data set. Real‐time outdoor images clicked in bad weather conditions, such as mist, haze, smog, and fog, often suffer from poor visibility, and the consequences are incorrect results and hence an unexpected robot's behavior. To overcome these consequences, we have presented a novel approach to object detection and identification during adverse weather conditions. This method is proposed to be implemented in a real‐time environment to monitor animal behavior near railway tracks during fog, haze, and smog. This is not limited to specific application areas but can be used to identify endangered species and take active steps to save them from mishap. The deployment is done in a real‐time indoor environment using Tortoisebot mobile robot with a robot operating system framework. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Comparative analysis of traditional machine learning and automated machine learning: advancing inverted papilloma versus associated squamous cell carcinoma diagnosis.
- Author
-
Hosseinzadeh, Farideh, Mohammadi, S. Saeed, Palmer, James N., Kohanski, Michael A., Adappa, Nithin D., Chang, Michael T., Hwang, Peter H., Nayak, Jayakar V., and Patel, Zara M.
- Subjects
- *
MACHINE learning , *IMAGE recognition (Computer vision) , *ARTIFICIAL intelligence , *DEEP learning , *SQUAMOUS cell carcinoma - Abstract
Key Points Inverted papilloma conversion to squamous cell carcinoma is not always easy to predict. AutoML requires much less technical knowledge and skill to use than traditional ML. AutoML surpassed the traditional ML algorithm in differentiating IP from IP‐SCC. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
24. Factor annealing decoupling compositional training method for imbalanced hyperspectral image classification.
- Author
-
Li, Xiaojun, Su, Yi, Yao, Junping, Guo, Yi, and Fan, Shuai
- Subjects
- *
IMAGE recognition (Computer vision) , *ARTIFICIAL intelligence , *IMAGE processing , *REMOTE sensing , *IMAGE representation , *DEEP learning - Abstract
Due to differences in the quantity and size of observed targets, hyperspectral images are characterized by class imbalance. The standard deep learning classification model training scheme optimizes the overall classification error, which may lead to performance imbalance between classes in hyperspectral image classification frameworks. Therefore, a novel factor annealing decoupling compositional training method is proposed in this paper. Without requiring resampling or reweighting, it implicitly modulates the training process, so standard models can sufficiently learn the representation of the minority classes and further be trained as robust classifiers. Specifically, the label‐distribution‐aware margin loss is combined with the error‐rate‐based cross‐entropy loss via combination factor, which considers both imbalanced data representation learning and classifier overall performance. Then, a factor annealing optimization training scheme is designed to adjust the combination factor, which solves the stage division problem of two‐stage decoupling learning. Experimental results on two hyperspectral image datasets demonstrate that, as compared with other competing approaches, the proposed method can continuously and stably optimize the model parameters, achieving improvements in class average metrics and difficult classes without affecting overall classification performance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Nighttime wildlife object detection based on YOLOv8‐night.
- Author
-
Wang, Tianyu, Ren, Siyu, and Zhang, Haiyan
- Subjects
- *
OBJECT recognition (Computer vision) , *IMAGE recognition (Computer vision) , *IMAGE processing , *WILDLIFE conservation , *NOCTURNAL animals - Abstract
Monitoring nocturnal animals in the field is an important task in ecological research and wildlife conservation, but the complexity of nocturnal images and low light conditions make it difficult to cope with traditional image processing methods. To address this problem, researchers have introduced infrared cameras to improve the accuracy of nocturnal animal behaviour observations. Object detection in nighttime images captured by infrared cameras faces several challenges, including low image quality, animal scale variations, occlusion, and pose changes. This study proposes the YOLOv8‐night model, which effectively overcomes these challenges by introducing a channel attention mechanism in YOLOv8. The model is more focused on capturing animal‐related features by dynamically adjusting the channel weights, which improves the saliency of key features and increases the accuracy rate in complex backgrounds. The main contribution of this study is the introduction of the channel attention mechanism into the YOLOv8 framework to create a YOLOv8‐night model suitable for object detection in nighttime images. When tested on nighttime images, the model performs well with a significantly higher mAP (0.854) than YOLOv8 (0.831), and YOLOv8‐night scores 0.856 on mAP_l, which is obviously better than YOLOv8 (0.833) in terms of processing large objects. The study provides a reliable technical tool for ecological research, wildlife conservation and environmental monitoring, and offers new methods and insights for the study of nocturnal animal behaviour. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Clean, performance‐robust, and performance‐sensitive historical information based adversarial self‐distillation.
- Author
-
Li, Shuyi, Hu, Hongchao, Huo, Shumin, and Liang, Hao
- Subjects
- *
IMAGE recognition (Computer vision) , *COMPUTER architecture , *COMPUTER vision , *DISTILLATION , *ALGORITHMS - Abstract
Adversarial training suffers from poor effectiveness due to the challenging optimisation of loss with hard labels. To address this issue, adversarial distillation has emerged as a potential solution, encouraging target models to mimic the output of the teachers. However, reliance on pre‐training teachers leads to additional training costs and raises concerns about the reliability of their knowledge. Furthermore, existing methods fail to consider the significant differences in unconfident samples between early and late stages, potentially resulting in robust overfitting. An adversarial defence method named Clean, Performance‐robust, and Performance‐sensitive Historical Information based Adversarial Self‐Distillation (CPr & PsHI‐ASD) is presented. Firstly, an adversarial self‐distillation replacement method based on clean, performance‐robust, and performance‐sensitive historical information is developed to eliminate pre‐training costs and enhance guidance reliability for the target model. Secondly, adversarial self‐distillation algorithms that leverage knowledge distilled from the previous iteration are introduced to facilitate the self‐distillation of adversarial knowledge and mitigate the problem of robust overfitting. Experiments are conducted to evaluate the performance of the proposed method on CIFAR‐10, CIFAR‐100, and Tiny‐ImageNet datasets. The results demonstrate that the CPr&PsHI‐ASD method is more effective than existing adversarial distillation methods in enhancing adversarial robustness and mitigating robust overfitting issues against various adversarial attacks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Noise‐tolerant matched filter scheme supplemented with neural dynamics algorithm for sea island extraction.
- Author
-
Chen, Yiyu, Fu, Dongyang, Wang, Difeng, Huang, Haoen, Si, Yang, and Du, Shangfeng
- Subjects
MATCHED filters ,ISLANDS ,REMOTE sensing ,IMAGE recognition (Computer vision) ,ALGORITHMS - Abstract
Achieving high‐precision extraction of sea islands from high‐resolution satellite remote sensing images is crucial for effective resource development and sustainable management. Unfortunately, achieving such accuracy for sea island extraction presents significant challenges due to the presence of extensive background interference. A more widely applicable noise‐tolerant matched filter (NTMF) scheme is proposed for sea island extraction based on the MF scheme. The NTMF scheme effectively suppresses the background interference, leading to more accurate and robust sea island extraction. To further enhance the accuracy and robustness of the NTMF scheme, a neural dynamics algorithm is supplemented that adds an error integration feedback term to counter noise interference during internal computer operations in practical applications. Several comparative experiments were conducted on various remote sensing images of sea islands under different noisy working conditions to demonstrate the superiority of the proposed neural dynamics algorithm‐assisted NTMF scheme. These experiments confirm the advantages of using the NTMF scheme for sea island extraction with the assistance of neural dynamics algorithm. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. A hierarchical, multi‐sensor framework for peatland sub‐class and vegetation mapping throughout the Canadian boreal forest.
- Author
-
Pontone, Nicholas, Millard, Koreen, Thompson, Dan K., Guindon, Luc, and Beaudoin, André
- Subjects
TAIGAS ,BOGS ,VEGETATION mapping ,PERMAFROST ecosystems ,HABITATS ,EFFECT of human beings on climate change ,ANTHROPOGENIC effects on nature ,CARBON cycle - Abstract
Peatlands in the Canadian boreal forest are being negatively impacted by anthropogenic climate change, the effects of which are expected to worsen. Peatland types and sub‐classes vary in their ecohydrological characteristics and are expected to have different responses to climate change. Large‐scale modelling frameworks such as the Canadian Model for Peatlands, the Canadian Fire Behaviour Prediction System and the Canadian Land Data Assimilation System require peatland maps including information on sub‐types and vegetation as critical inputs. Additionally, peatland class and vegetation height are critical variables for wildlife habitat management and are related to the carbon cycle and wildfire fuel loading. This research aimed to create a map of peatland sub‐classes (bog, poor fen, rich fen permafrost peat complex) for the Canadian boreal forest and create an inventory of peatland vegetation height characteristics using ICESat‐2. A three‐stage hierarchical classification framework was developed to map peatland sub‐classes within the Canadian boreal forest circa 2020. Training and validation data consisted of peatland locations derived from various sources (field data, aerial photo interpretation, measurements documented in literature). A combination of multispectral data, L‐band SAR backscatter and C‐Band interferometric SAR coherence, forest structure and ancillary variables was used as model predictors. Ancillary data were used to mask agricultural areas and urban regions and account for regions that may exhibit permafrost. In the first stage of the classification, wetlands, uplands and water were classified with 86.5% accuracy. In the second stage, within the wetland areas only, peatland and mineral wetlands were differentiated with 93.3% accuracy. In the third stage, constrained to only the peatland areas, bogs, rich fens, poor fens and permafrost peat complexes were classified with 71.5% accuracy. Then, ICESat‐2 ATL08 spaceborne lidar data were used to describe regional variations in peatland vegetation height characteristics and regional and class‐wise variations based on a boreal forest wide sample. This research introduced a comprehensive large‐scale peatland sub‐class mapping framework for the Canadian boreal forest, presenting the first moderate resolution map of its kind. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. Superpixel‐guided locality preserving projection and spatial–spectral classification for hyperspectral image.
- Author
-
Song, Hailong and Zhang, Shuzhen
- Subjects
- *
IMAGE recognition (Computer vision) , *SPECTRAL imaging , *MULTISPECTRAL imaging , *FEATURE extraction , *HYPERGRAPHS - Abstract
Locality preserving projection (LPP) is a typical feature extraction method based on spectral information for hyperspectral image (HSI) classification. Recently, to improve the classification performance, the spatial information of HSI has been applied in the LPP method. However, for most of spatial–spectral‐based LPP methods, they explore the spatial–spectral information within a fixed local window, which cannot be appropriate to the irregular‐shape ground objects in HSI. To over this issue, an effective superpixel‐guided LPP and spatial–spectral classification method are proposed, in which the spatial–adaptive structure information is fully excavated for HSI classification. Specifically, superpixel segmentation is first conducted on the HSI to generate shape‐adaptive homogeneous subregions. Then, to learn more discriminative projection, the neighbourhood graph for LPP is constructed based on spatial–spectral similarity, in which pixels within the same superpixel are connected. Finally, the obtained projection feature is input a classifier to yield the initial classification result, and the edge information of ground objects captured by superpixels is utilized to optimize the initial classification result. Experiments on two real hyperspectral datasets demonstrate that the proposed superpixel‐guided and spatial–spectral classification method significantly outperforms the other well‐known techniques for HSI classification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. Applying novel self‐supervised learning for early detection of retinopathy of prematurity.
- Author
-
Wang, Dongmei, Qiao, Wanli, Guo, Wei, and Cai, Yuansong
- Subjects
- *
LOW birth weight , *RETROLENTAL fibroplasia , *SUPERVISED learning , *PREMATURE infants - Abstract
Retinopathy of prematurity (ROP) mainly occurs in premature infants with low birth weight, and it is the leading cause of childhood blindness. Early and accurate ROP diagnosis is imperative for appropriate treatment. However, less research concentrates on early‐stage ROP diagnosis based on limited‐labelled images in an imbalanced dataset. To address the dilemma, this study proposed a novel self‐supervised network, MOCO‐MIM, for early ROP grading. The proposed classification network was evaluated on a total of 553 labelled fundus images from 89 preterm infants. The trained network achieved a test accuracy of 98.29% and an AUC score of 97.6% for three stages of grading. The adopted method is verified that the proposed method can be detected early stages of ROP more efficiently and grade the severity more accurately based on limited‐labelled fundus images, which is superior to the existing state‐of‐the‐art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Modelling appearance variations in expressive and neutral face image for automatic facial expression recognition.
- Author
-
Kumar H N, Naveen, M S, Guru Prasad, Asif Shah, Mohd, Mahadevaswamy, B, Jagadeesh, and K, Sudheesh
- Subjects
- *
FACIAL expression , *SUPPORT vector machines , *FACE , *IMAGE recognition (Computer vision) , *EMOTION recognition - Abstract
In automatic facial expression recognition (AFER) systems, modelling the spatio‐temporal feature information in a specific manner, coalescing, and its effective utilization is challenging. The state‐of‐the‐art studies have examined integrating multiple features to enhance the recognition rate of AFER systems. However, the feature variations between expressive and neutral face images are not fully explored to identify the expression class. The proposed research presents an innovative approach to AFER by modelling appearance variations in both expressive and neutral face images. The prominent contributions of the work are developing a novel and hybrid feature space by integrating the discriminative feature distribution derived from expressive and neutral face images; preserving the highly discriminative latent feature distribution using autoencoders. Local binary pattern (LBP) and histogram of oriented gradients (HOG) are the feature descriptors employed to derive the discriminative texture and shape information, respectively. The component‐based approach is employed, wherein the features are derived from the salient facial regions instead of the whole face. The three‐stage stacked deep convolutional autoencoder (SDCA) and multi‐class support vector machine (MSVM) are employed to address dimensionality reduction and classification, respectively. The efficacy of the proposed model is substantiated by empirical findings, which establish its superiority in terms of accuracy in AFER tasks on widely recognized benchmark datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. LPNet: A remote sensing scene classification method based on large kernel convolution and parameter fusion.
- Author
-
Wang, Guowei, Shi, Furong, Wang, Xinyu, Xu, Haixia, Yuan, Liming, and Wen, Xianbin
- Subjects
- *
IMAGE recognition (Computer vision) , *FEATURE extraction , *CONVOLUTIONAL neural networks , *IMAGE representation , *COMPUTER vision - Abstract
Remote sensing scene images contain numerous feature targets with unrelated semantic information, so how to extract to the local key information and semantic features of the image becomes the key to achieving accurate classification. Existing Convolutional Neural Networks (CNNs) mostly concentrate on the global representation of an image and lose the shallow features. To overcome these issues, this paper proposes LPNet for remote sensing scene image classification. First, LPNet employs LKConv to extract the semantic features in the image, while using standard convolution to extract local key information in the image. Additionally, the LPNet applies a shortcut residual concatenation branch to reuse features. Then, parameter fusion combines parameters from previous branches, improving the capacity of the model to obtain a more comprehensive and rich feature representation of the image. Finally, considering the relationship between the classification ability of the model and the depth of feature extraction, the Feature Mixture (FM) Block is used to deepen the model for feature extraction. Comparative experiments on four publicly available datasets show that LPNet provides comparable results to other state‐of‐the‐art methods. The effectiveness of LPNet is further demonstrated by visualizing the effective receptive fields (ERFs). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. Unsupervised hyperspectral images classification using hypergraph convolutional extreme learning machines.
- Author
-
Zhang, Hongrui, Lv, Hongfei, Wang, Mengke, Wang, Luyao, Xu, Jinhuan, Wang, Fenggui, and Li, Xiangdong
- Subjects
- *
IMAGE recognition (Computer vision) , *MACHINE learning , *FEATURE selection - Abstract
Aiming at the problem that traditional methods are difficult to fully utilize the rich spectral information in hyperspectral images (HSI) and fail to capture the complex higher‐order relations in hyperspectral data, which leads to limited classification performance extreme learning machine and fails to further improve the classification accuracy of HSIs, the authors propose the hypergraph convolutional extreme learning machine (HGCELM) method. The method not only inherits all the advantages of extreme learning machine (ELM), but also embeds hypergraph convolution for feature selection, which is capable of handling higher‐order relations. This enables HGCELM to capture more complex relationships between nodes and provide richer representation capabilities. At the same time, the training speed advantage of ELM is retained, thus speeding up the model training process. Experimental results show that the proposed algorithm achieves better accuracy compared to other clustering algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
34. Histological tissue classification with a novel statistical filter‐based convolutional neural network.
- Author
-
Ünlükal, Nejat, Ülker, Erkan, Solmaz, Merve, Uyar, Kübra, and Taşdemir, Şakir
- Subjects
- *
CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *DEEP learning , *CENTRAL processing units , *FEATURE extraction - Abstract
Deep networks have been of considerable interest in literature and have enabled the solution of recent real‐world applications. Due to filters that offer feature extraction, Convolutional Neural Network (CNN) is recognized as an accurate, efficient and trustworthy deep learning technique for the solution of image‐based challenges. The high‐performing CNNs are computationally demanding even if they produce good results in a variety of applications. This is because a large number of parameters limit their ability to be reused on central processing units with low performance. To address these limitations, we suggest a novel statistical filter‐based CNN (HistStatCNN) for image classification. The convolution kernels of the designed CNN model were initialized by continuous statistical methods. The performance of the proposed filter initialization approach was evaluated on a novel histological dataset and various histopathological benchmark datasets. To prove the efficiency of statistical filters, three unique parameter sets and a mixed parameter set of statistical filters were applied to the designed CNN model for the classification task. According to the results, the accuracy of GoogleNet, ResNet18, ResNet50 and ResNet101 models were 85.56%, 85.24%, 83.59% and 83.79%, respectively. The accuracy was improved by 87.13% by HistStatCNN for the histological data classification task. Moreover, the performance of the proposed filter generation approach was proved by testing on various histopathological benchmark datasets, increasing average accuracy rates. Experimental results validate that the proposed statistical filters enhance the performance of the network with more simple CNN models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Machine learning‐assisted anomaly detection for power line components: A case study in Pakistan.
- Author
-
Basit, Abdul, Manzoor, Habib Ullah, Akram, Muhammad, Gelani, Hasan Erteza, and Hussain, Sajjad
- Subjects
MACHINE learning ,ANOMALY detection (Computer security) ,ELECTRIC lines ,ELECTRIC power distribution - Abstract
A continuous supply of electricity is necessary to maintain an acceptable standard of life, and the power distribution system's overhead line components play a crucial role in this matter. In Pakistan, identifying defective parts often necessitates human involvement. An unmanned aerial vehicle was used to gather a collection of 10,343 photos to automate this procedure. Using supervised and unsupervised machine learning methods, a number of automated anomaly detection systems were created. Support vector machine, random forest, VGG16, and ResNet50 were used as supervised machine learning models, and a convolutional auto‐encoder was used as the unsupervised machine learning model. VGG16 achieved the best accuracy of 99.00% while random forest achieved the worst accuracy of 72.49%. The convolutional auto‐encoder was successful in distinguishing between normal and abnormal components. The aforementioned machine learning models can be put on unmanned aerial vehicles to immediately identify defective parts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Comprehensive data analysis of white blood cells with classification and segmentation by using deep learning approaches.
- Author
-
Özcan, Şeyma Nur, Uyar, Tansel, and Karayeğen, Gökay
- Abstract
Deep learning approaches have frequently been used in the classification and segmentation of human peripheral blood cells. The common feature of previous studies was that they used more than one dataset, but used them separately. No study has been found that combines more than two datasets to use together. In classification, five types of white blood cells were identified by using a mixture of four different datasets. In segmentation, four types of white blood cells were determined, and three different neural networks, including CNN (Convolutional Neural Network), UNet and SegNet, were applied. The classification results of the presented study were compared with those of related studies. The balanced accuracy was 98.03%, and the test accuracy of the train‐independent dataset was determined to be 97.27%. For segmentation, accuracy rates of 98.9% for train‐dependent dataset and 92.82% for train‐independent dataset for the proposed CNN were obtained in both nucleus and cytoplasm detection. In the presented study, the proposed method showed that it could detect white blood cells from a train‐independent dataset with high accuracy. Additionally, it is promising as a diagnostic tool that can be used in the clinical field, with successful results in classification and segmentation. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
37. VedgeSat: An automated, open‐source toolkit for coastal change monitoring using satellite‐derived vegetation edges.
- Author
-
Muir, Freya M. E., Hurst, Martin D., Richardson‐Foulger, Luke, Rennie, Alistair F., and Naylor, Larissa A.
- Subjects
VEGETATION boundaries ,REMOTE-sensing images ,STANDARD deviations ,COASTS ,IMAGE processing ,LANDSAT satellites ,SHORELINES - Abstract
Public satellite platforms offer regular observations for global coastal monitoring and climate change risk management strategies. Unfortunately, shoreline positions derived from satellite imagery, representing changes in intertidal topography, are noisy and subject to tidal bias that requires correction. The seaward‐most vegetation boundary reflects a change indicator which shifts on event–decadal timescales, and informs coastal practitioners of storm damage, sediment availability and coastal landform health. We present and validate a new open‐source tool VedgeSat for identifying vegetation edges (VEs) from high (3 m) and moderate (10–30 m) resolution satellite imagery. The methodology is based on the CoastSat toolkit, with streamlined image processing using cloud‐based data management via Google Earth Engine. Images are classified using a newly trained vegetation‐specific neural network, and VEs are extracted at subpixel level using dynamic Weighted Peaks thresholding. We performed validation against ground surveys and manual digitisation of aerial imagery across eroding and accreting open coasts and estuarine environments at a site in Scotland. Smaller‐than‐pixel vegetation boundary detection was achieved across 83% of Sentinel‐2 imagery (Root Mean Square Error of 9.3 m). An overall RMSE of 19.0 m was achieved across Landsat 5 & 8, Sentinel‐2 and PlanetScope images. Performance varied by coastal geomorphology, with highest accuracies across sandy open coasts owing to high spectral contrast and less false positives from intertidal vegetation. The VedgeSat tool can be readily applied in tandem with waterlines near‐globally, to support adaptation decisions with historic coastal trends across the whole shoreface, even in normally data‐scarce areas. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
38. Explanation strategies in humans versus current explainable artificial intelligence: Insights from image classification.
- Author
-
Qi, Ruoxi, Zheng, Yueyuan, Yang, Yi, Cao, Caleb Chen, and Hsiao, Janet H.
- Abstract
Explainable AI (XAI) methods provide explanations of AI models, but our understanding of how they compare with human explanations remains limited. Here, we examined human participants' attention strategies when classifying images and when explaining how they classified the images through eye‐tracking and compared their attention strategies with saliency‐based explanations from current XAI methods. We found that humans adopted more explorative attention strategies for the explanation task than the classification task itself. Two representative explanation strategies were identified through clustering: One involved focused visual scanning on foreground objects with more conceptual explanations, which contained more specific information for inferring class labels, whereas the other involved explorative scanning with more visual explanations, which were rated higher in effectiveness for early category learning. Interestingly, XAI saliency map explanations had the highest similarity to the explorative attention strategy in humans, and explanations highlighting discriminative features from invoking observable causality through perturbation had higher similarity to human strategies than those highlighting internal features associated with higher class score. Thus, humans use both visual and conceptual information during explanation, which serve different purposes, and XAI methods that highlight features informing observable causality match better with human explanations, potentially more accessible to users. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
39. A two‐stage substation equipment classification method based on dual‐scale attention.
- Author
-
Yao, Yiyang, Wang, Xue, Zhou, Guoqing, and Wang, Qing
- Subjects
- *
FEATURE extraction , *IMAGE recognition (Computer vision) , *OBJECT recognition (Computer vision) , *CLASSIFICATION , *COMPUTER vision - Abstract
Accurate classification of substation equipment images remains challenging due to various factors such as unexpected illumination, viewing angles, scale variations, shadows, surface contaminants, and different elements sharing similar appearances. This paper presents a novel two‐stage substation equipment classification method based on dual‐scale attention. Leveraging the region proposal technique from Faster‐regions with CNN features (RCNN), the input images are initially decomposed into multiple scales to capture latent features. A dual‐scale attention module is introduced to enhance the precision of feature extraction. Furthermore, a two‐stage network is proposed to address the challenge of classifying closely similar substation equipment. A multi‐layer perceptron performs a coarse classification to categorize the equipment into broad categories. Then, a lightweight classifier is employed for fine‐grained subclassification, further distinguishing equipment within the same broad category. To mitigate the issue of limited training data, a specialized dataset is collected and annotated for the substation equipment classification. Experimental results demonstrate that the proposed method achieves remarkable accuracy, recall, and F1‐score surpassing 0.91, outperforming mainstream approaches in terms of recall and F1 scores. Ablation experiments further validate the significant contributions of both the dual‐scale attention and the two‐stage classification module in improving the overall performance of the classification network. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
40. An image‐based runway detection method for fixed‐wing aircraft based on deep neural network.
- Author
-
Chen, Mingqiang and Hu, Yuzhou
- Subjects
- *
ARTIFICIAL neural networks , *CONVOLUTIONAL neural networks , *LANDING (Aeronautics) , *FLIGHT simulators , *IMAGE recognition (Computer vision) - Abstract
Visual information is important in final approach and landing phases for an approaching aircraft, it presents supplementary source for navigation system, and provides backup guidance when radio navigation fails, or even supports a complete vision‐based landing. Relative position and attitude can be solved from the runway features in the image. Traditional runway detection methods have high latency and low accuracy, which is unable to satisfy the requirements for a safe landing. This paper proposes a real‐time runway detection model, efficient runway feature extractor (ERFE), based on deep convolutional neural network, generating semantic segmentation and feature lines output. In order to evaluate the model's effectiveness, a benchmark is proposed to calculate the actual error between predicted feature line and ground truth one. A novel runway dataset which is based on pictures from Microsoft Flight Simulator 2020 (FS2020), is also proposed in this paper to train and test the model. The dataset will be released at https://www.kaggle.com/datasets/relufrank/fs2020‐runway‐dataset. ERFE shows excellent performance in FS2020 dataset, it gives satisfactory results even for real runway images excluded from our dataset. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
41. Learnable fusion mechanisms for multimodal object detection in autonomous vehicles.
- Author
-
Massoud, Yahya and Laganiere, Robert
- Subjects
- *
OBJECT recognition (Computer vision) , *COMPUTER vision , *CONVOLUTIONAL neural networks , *IMAGE recognition (Computer vision) , *DATA augmentation , *AUTONOMOUS vehicles , *BILINEAR forms - Abstract
Perception systems in autonomous vehicles need to accurately detect and classify objects within their surrounding environments. Numerous types of sensors are deployed on these vehicles, and the combination of such multimodal data streams can significantly boost performance. The authors introduce a novel sensor fusion framework using deep convolutional neural networks. The framework employs both camera and LiDAR sensors in a multimodal, multiview configuration. The authors leverage both data types by introducing two new innovative fusion mechanisms: element‐wise multiplication and multimodal factorised bilinear pooling. The methods improve the bird's eye view moderate average precision score by +4.97% and +8.35% on the KITTI dataset when compared to traditional fusion operators like element‐wise addition and feature map concatenation. An in‐depth analysis of key design choices impacting performance, such as data augmentation, multi‐task learning, and convolutional architecture design is offered. The study aims to pave the way for the development of more robust multimodal machine vision systems. The authors conclude the paper with qualitative results, discussing both successful and problematic cases, along with potential ways to mitigate the latter. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
42. 40‐1: Development of the Auto Monitoring Method of Laser Beam Shape and Size by Employing the AI and Computer Vision Algorithm.
- Author
-
Lim, Sang-Hoon, Oh, Youngjin, and Yoo, Kyung-Jin
- Subjects
IMAGE recognition (Computer vision) ,LASER beam cutting ,HIGH power lasers ,LASER beams ,IMAGE processing - Abstract
The beam shape of the laser cut process to change the display shape is very important in the laser cutting quality, and the laser beam shape was measured by using the beam profiler as a measuring instrument. Since the beam profiler cannot be used due to damage caused by the high laser processing power used in mass production, the beam shape be managed that the sample was created by laser beam cross shot drawing on the material and it was manually measured by using a microscope. However, this method had problems with a long management period (measured once a month), a machine down for the measurement, the measurement deviation between engineers, and inability to manage history, so it was necessary to develop AI‐based shape automatic management technology. To solve this problem, an AI agent program with the AI technology to determine OK, NG of laser beam and the beam size measurement technology based on the image processing was developed and applied to the production line. Through this program, automatic management and history management of beam shape and size that was not possible with previous method have become possible. It was confirmed that the AI and image processing technologies developed through this study can be important technologies that can have a ripple effect on the development of other technologies. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
43. Convolutional neural network based on the fusion of image classification and segmentation module for weed detection in alfalfa.
- Author
-
Yang, Jie, Chen, Yong, and Yu, Jialin
- Subjects
CONVOLUTIONAL neural networks ,IMAGE recognition (Computer vision) ,IMAGE fusion ,IMAGE segmentation ,ALFALFA growing ,ALFALFA ,WEEDS - Abstract
BACKGROUND: Accurate and reliable weed detection in real time is essential for realizing autonomous precision herbicide application. The objective of this research was to propose a novel neural network architecture to improve the detection accuracy for broadleaf weeds growing in alfalfa. RESULTS: A novel neural network, ResNet‐101‐segmentation, was developed by fusing an image classification and segmentation module with the backbone selected from ResNet‐101. Compared with existing neural networks (AlexNet, GoogLeNet, VGG16, and ResNet‐101), ResNet‐101‐segmentation improved the detection of Carolina geranium, catchweed bedstraw, mugwort and speedwell from 78.27% to 98.17%, from 79.49% to 98.28%, from 67.03% to 96.23%, and from 75.95% to 98.06%, respectively. The novel network exhibited high values of confusion matrices (>90%) when trained with sufficient data sets. CONCLUSION: ResNet‐101‐segmentation demonstrated excellent performance compared with existing models (AlexNet, GoogLeNet, VGG16, and ResNet‐101) for detecting broadleaf weeds growing in alfalfa. This approach offers a promising solution to increase the accuracy of weed detection, especially in cases where weeds and crops have similar plant morphology. © 2024 Society of Chemical Industry. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
44. STFormer: Spatio‐temporal former for hand–object interaction recognition from egocentric RGB video.
- Author
-
Liang, Jiao, Wang, Xihan, Yang, Jiayi, and Gao, Quanli
- Subjects
- *
IMAGE recognition (Computer vision) , *FEATURE extraction , *RESEARCH personnel - Abstract
In recent years, video‐based hand–object interaction has received widespread attention from researchers. However, due to the complexity and occlusion of hand movements, hand–object interaction recognition based on RGB videos remains a highly challenging task. Here, an end‐to‐end spatio‐temporal former (STFormer) network for understanding hand behaviour in interactions is proposed. The network consists of three modules: FlexiViT feature extraction, hand–object pose estimator, and interaction action classifier. The FlexiViT is used to extract multi‐scale features from each image frame. The hand–object pose estimator is designed to predict 3D hand pose keypoints and object labels for each frame. The interaction action classifier is used to predict the interaction action categories for the entire video. The experimental results demonstrate that our approach achieves competitive recognition accuracies of 94.96% and 88.84% on two datasets, namely first‐person hand action (FPHA) and 2 Hands and Objects (H2O). [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
45. Efficient visual transformer transferring from neural ODE perspective.
- Author
-
Niu, Hao, Luo, Fengming, Yuan, Bo, Zhang, Yi, and Wang, Jianyong
- Subjects
- *
IMAGE recognition (Computer vision) , *ORDINARY differential equations , *TRANSFORMER models , *COMPUTER vision , *EULER equations - Abstract
Recently, the Visual Image Transformer (ViT) has revolutionized various domains in computer vision. The transfer of pre‐trained ViT models on large‐scale datasets has proven to be a promising method for downstream tasks. However, traditional transfer methods introduce numerous additional parameters in transformer blocks, posing new challenges in learning downstream tasks. This article proposes an efficient transfer method from the perspective of neural Ordinary Differential Equations (ODEs) to address this issue. On the one hand, the residual connections in the transformer layers can be interpreted as the numerical integration of differential equations. Therefore, the transformer block can be described as two explicit Euler method equations. By dynamically learning the step size in the explicit Euler equation, a highly lightweight method for transferring the transformer block is obtained. On the other hand, a new learnable neural memory ODE block is proposed by taking inspiration from the self‐inhibition mechanism in neural systems. It increases the diversity of dynamical behaviours of the neurons to transfer the head block efficiently and enhances non‐linearity simultaneously. Experimental results in image classification demonstrate that the proposed approach can effectively transfer ViT models and outperform state‐of‐the‐art methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
46. Improvement of ship target detection algorithm for YOLOv7‐tiny.
- Author
-
Zhang, Huixia, Yu, Haishen, Tao, Yadong, Zhu, Wenliang, and Zhang, Kaige
- Subjects
- *
IMAGE recognition (Computer vision) , *SPINE , *ALGORITHMS , *SHIPS - Abstract
In addressing the challenge of ships being prone to occlusion in multi‐target situations during ship target detection, leading to missed and false detections, this paper proposes an enhanced ship detection algorithm for YOLOv7‐tiny. The proposed method incorporates several key modifications. Firstly, it introduces the Convolutional Block Attention Module in the Backbone section of the original model, emphasizing position information while attending to channel features to enhance the network's ability to extract crucial information. Secondly, it replaces standard convolution with GSConv convolution in the Neck section, preserving detailed information and reducing computational load. Subsequently, the lightweight operator Content‐Aware ReAssembly of Features is employed to replace the original nearest‐neighbour interpolation, mitigating the loss of feature information during the up‐sampling process. Finally, the localization loss function, SIOU Loss, is utilized to calculate loss, expedite training convergence, and enhance detection accuracy. The research results indicate that the precision of the improved model is 91.2%, mAP@0.5 is 94.5%, and the F1‐score is 90.7%. These values are 3.7%, 5.5%, and 4.2% higher than those of the original YOLOv7‐tiny model, respectively. The improved model effectively enhances detection accuracy. Additionally, the improved model achieves an FPS of 145.4, meeting real‐time requirements. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
47. Interpretable vision transformer based on prototype parts for COVID‐19 detection.
- Author
-
Xu, Yang and Meng, Zuqiang
- Subjects
- *
TRANSFORMER models , *COVID-19 , *COMPUTED tomography , *PROTOTYPES , *POWER transformers , *DEEP learning , *COMPUTER-assisted image analysis (Medicine) , *NO-tillage - Abstract
Over the past few years, the COVID‐19 virus has had a significant impact on the physical and mental health of people around the world. Therefore, in order to effectively distinguish COVID‐19 patients, many deep learning efforts have used chest medical images to detect COVID‐19. As with model accuracy, interpretability is also important in the work related to human health. This work introduces an interpretable vision transformer that uses the prototype method for the detection of positive patients with COVID‐19. The model can learn the prototype features of each category based on the structural characteristics of ViT. The predictions of the model are obtained by comparing all the features of the prototype in the designed prototype block. The proposed model was applied to two chest X‐ray datasets and one chest CT dataset, achieving classification performance of 99.3%, 96.8%, and 98.5% respectively. Moreover, the prototype method can significantly improve the interpretability of the model. The decisions of the model can be interpreted based on prototype parts. In the prototype block, the entire inference process of the model can be shown and the predictions of the model can be demonstrated to be meaningful through the visualization of the prototype features. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
48. TIM‐Net: A multi‐label classification network for TCM tongue images fusing global‐local features.
- Author
-
Zhang, Xinfeng, Shao, Jie, Bian, Haonan, Li, Hui, Jia, Maoshen, and Liu, Xiaomin
- Subjects
- *
TONGUE , *FEATURE extraction , *CLASSIFICATION algorithms , *CLASSIFICATION , *IMAGE recognition (Computer vision) , *GLOBAL production networks - Abstract
Combining the extracted tongue features with other medical indicators can effectively judge the diseases of patients. The previous work usually only analyzes a certain feature of the tongue body and is unable to extract multiple features simultaneously. In this study, a multi‐label classification network named TIM‐Net is proposed, which integrates global and local features to achieve multi‐label intelligent diagnosis of Chinese medicine tongue images. First, a feature extraction network based on ResNet is proposed to capture the features of tongue images more sufficiently. Then, a multi‐label classification algorithm fusing global and local features is proposed, and targeted screening operations are carried out on the class‐related feature maps based on global confidence. In addition, a logical masking algorithm is proposed to ensure that the local features can only correct the feature labels they represent, and do not interfere with other feature labels. The classification accuracy is further improved by using local feature confidence and correcting the global classification results. Finally, the experimental results indicate that the classification accuracy of the tongue images is gradually improved through optimizing the feature extraction network and fusing local features, and it exceeds other state‐of‐the‐art multi‐label classification networks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
49. MSFA: Multi‐stage feature aggregation network for multi‐label image recognition.
- Author
-
Chen, Jiale, Xu, Feng, Zeng, Tao, Li, Xin, Chen, Shangjing, and Yu, Jie
- Subjects
- *
IMAGE recognition (Computer vision) , *OBJECT recognition (Computer vision) , *COMPUTER vision - Abstract
Multi‐label image recognition (MLR) is a significant branch of image classification that aims to assign multiple categorical labels to each input. Previous research has focused on enhancing the learning of category‐related regional features. However, the potential impact of multi‐scale distributions in intra‐ and inter‐category targets on MLR tends to be neglected. Besides, semantic consistency for categories is restricted to be considered on single‐scale features, resulting in suboptimal feature extraction. To address the limitations of above, a Multi‐stage Feature Aggregation (MSFA) network is proposed. In MSFA, a novel local feature extraction method is suggested to progressively extract category‐related high‐resolution local features in both spatial and channel dimensions. Subsequently, local and global features are fused without additional up‐ and down‐sampling to enrich the scale diversity of the features while incorporating refined class‐specific information. Furthermore, a hierarchical prediction scheme for MLR is proposed, which generates classification confidence corresponding to different scales under hierarchical loss supervision. Consequently, the final output of the network comes from the joint prediction by the classifiers on multi‐scale features, ensuring a stronger feature extraction capability. The extensive experiments have been carried on VOC and MS‐COCO datasets, and the superiority of MSFA over existing mainstream methods has been verified. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. SwinRes: A hybrid model that effectively diagnoses COVID‐19 through x‐ray lung images.
- Author
-
He, Xuanlong, Yang, Hong, Xu, Jipan, and Mu, Hongbo
- Subjects
- *
LUNGS , *TRANSFORMER models , *X-ray imaging , *IMAGE recognition (Computer vision) , *COVID-19 testing , *OLDER patients - Abstract
COVID‐19 has been ravaging the world for a long time, and although its effects are currently the same as those of a cold or a fever, timely diagnosis of COVID‐19 in the elderly and in patients with related illnesses is still a matter of great urgency. To address this challenge, we propose a model that combines the strengths of the Swin Transformer and ResNet34 architectures to efficiently diagnose COVID‐19 in elderly and vulnerable patients. In this paper, we design a model that integrates Swin transformer and resnet34, which not only integrates the advantages of transformer and CNN but also achieves excellent performance in this image classification problem. Moreover, a pre‐processing method is also proposed to increase the accuracy of the model to 99.08%. In this paper, experiments were conducted on Kaggle's publicly available three‐classification and four‐classification datasets, respectively, and on the three main evaluation metrics of Accuracy, Precision, and Recall, the first dataset obtained 98.81%, 99.49%, and 97.99%, while the second dataset obtained 88.82%, 88.92%, and 86.38%. These findings highlight the validity and potential of our proposed model for diagnosing the presence or absence of COVID‐19 in elderly and vulnerable patients. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.