Descriptor: "image classification" / Publisher: mdpi ag - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"image classification"' showing total 1,020 results

Start Over Descriptor "image classification" Publisher mdpi ag

1,020 results on '"image classification"'

1. Understanding How Image Quality Affects Transformer Neural Networks

Author: Domonkos Varga
Subjects: transformer models, image classification, noise sensitivity, computer vision, Applied mathematics. Quantitative methods, T57-57.97
Abstract: Deep learning models, particularly transformer architectures, have revolutionized various computer vision tasks, including image classification. However, their performance under different types and levels of noise remains a crucial area of investigation. In this study, we explore the noise sensitivity of prominent transformer models trained on the ImageNet dataset. We systematically evaluate 22 transformer variants, ranging from state-of-the-art large-scale models to compact versions tailored for mobile applications, under five common types of image distortions. Our findings reveal diverse sensitivities across different transformer architectures, with notable variations in performance observed under additive Gaussian noise, multiplicative Gaussian noise, Gaussian blur, salt-and-pepper noise, and JPEG compression. Interestingly, we observe a consistent robustness of transformer models to JPEG compression, with top-5 accuracies exhibiting higher resilience to noise compared to top-1 accuracies. Furthermore, our analysis highlights the vulnerability of mobile-oriented transformer variants to various noise types, underscoring the importance of noise robustness considerations in model design and deployment for real-world applications. These insights contribute to a deeper understanding of transformer model behavior under noisy conditions and have implications for improving the robustness and reliability of deep learning systems in practical scenarios.
Published: 2024
Full Text: View/download PDF

2. GamaNNet: A Novel Plant Pathologist-Level CNN Architecture for Intelligent Diagnosis

Author: Marcio Oliveira, Adunias Teixeira, Guilherme Barreto, and Cristiano Lima
Subjects: image classification, symptoms, pattern, Solanum lycopersicum, Agriculture (General), S1-972, Engineering (General). Civil engineering (General), TA1-2040
Abstract: Plant pathologies significantly jeopardise global food security, necessitating the development of prompt and precise diagnostic methods. This study employs advanced deep learning techniques to evaluate the performance of nine convolutional neural networks (CNNs) in identifying a spectrum of phytosanitary issues affecting the foliage of Solanum lycopersicum (tomato). Ten thousand RGB images of leaf tissue were subsampled in training (64%), validation (16%), and test (20%) sets to rank the most suitable CNNs in expediting the diagnosis of plant disease. The study assessed the performance of eight well-known networks under identical hyperparameter conditions. Additionally, it introduced the GamaNNet architecture, a custom-designed model optimised for superior performance on this specific type of dataset. The investigational results were most promising for the innovative GamaNNet and ResNet-152, which both exhibited a 91% accuracy rate, as evidenced by their confusion matrices, ROC curves, and AUC metrics. In comparison, LeNet-5 and ResNet-50 demonstrated lower assertiveness, attaining accuracies of 74% and 69%, respectively. GoogLeNet and Inception-v3 emerged as the frontrunners, displaying diagnostic preeminence, achieving an average F1-score of 97%. Identifying such pathologies as Early Blight, Late Blight, Corynespora Leaf Spot, and Septoria Leaf Spot posed the most significant challenge for this class of problem.
Published: 2024
Full Text: View/download PDF

3. Sustainable Machine Vision for Industry 4.0: A Comprehensive Review of Convolutional Neural Networks and Hardware Accelerators in Computer Vision

Author: Muhammad Hussain
Subjects: artificial intelligence, computer vision, hardware advancements, image classification, object detection, Electronic computers. Computer science, QA75.5-76.95
Abstract: As manifestations of Industry 4.0. become visible across various applications, one key and opportune area of development are quality inspection processes and defect detection. Over the last decade, computer vision architectures, in particular, object detectors have received increasing attention from the research community, due to their localisation advantage over image classification. However, for these architectural advancements to provide tangible solutions, they must be optimised with respect to the target hardware along with the deployment environment. To this effect, this survey provides an in-depth review of the architectural progression of image classification and object detection architectures with a focus on advancements within Artificially Intelligent accelerator hardware. This will provide readers with an understanding of the present state of architecture–hardware integration within the computer vision discipline. The review also provides examples of the industrial implementation of computer vision architectures across various domains, from the detection of fabric defects to pallet racking inspection. The survey highlights the need for representative hardware-benchmarked datasets for providing better performance comparisons along with envisioning object detection as the primary domain where more research efforts would be focused over the next decade.
Published: 2024
Full Text: View/download PDF

4. Spatiotemporal Analysis of Total Suspended Solids in Water Bodies and Mapping Mining Areas in Suriname and French Guiana

Author: Breno Mello Pereira and Felipe de Lucia Lobo
Subjects: mining, total suspended solids, environmental monitoring, remote sensing, Google Earth Engine, image classification, Mining engineering. Metallurgy, TN1-997
Abstract: Artisanal and small-scale gold mining (ASGM) has made several environmental impacts, resulting in the significant siltation of water bodies due to the deposition of sediments on riverbanks. Based on this perspective, this study aims to investigate the water bodies and regions most impacted by mining activities, especially in relation to the increase in the Total Suspended Solids (TSS) caused by ASGM, focusing on the territories of Suriname and French Guiana, over the period from 2017 to 2023, through the creation of an algorithm in Google Earth Engine. This research also aims to map and describe active mining in this region using the Classification and Regression Tree (CART) method, which achieved an overall accuracy of 82% and a kappa index of 0.77. The results reveal that from 2017 to 2024, there was an increase of 148.09 km2 in mining, with an average increase in TSS of up to 167 mg/L in water bodies most affected by mining activities. Finally, the continued importance of using remote sensing technologies, such as GEE, together with innovative methodological approaches, to monitor and manage natural resources in a sustainable manner is highlighted.
Published: 2024
Full Text: View/download PDF

5. Using Segmentation to Boost Classification Performance and Explainability in CapsNets

Author: Dominik Vranay, Maroš Hliboký, László Kovács, and Peter Sinčák
Subjects: capsule network, explainability, image classification, reconstruction, segmentation, Computer engineering. Computer hardware, TK7885-7895
Abstract: In this paper, we present Combined-CapsNet (C-CapsNet), a novel approach aimed at enhancing the performance and explainability of Capsule Neural Networks (CapsNets) in image classification tasks. Our method involves the integration of segmentation masks as reconstruction targets within the CapsNet architecture. This integration helps in better feature extraction by focusing on significant image parts while reducing the number of parameters required for accurate classification. C-CapsNet combines principles from Efficient-CapsNet and the original CapsNet, introducing several novel improvements such as the use of segmentation masks to reconstruct images and a number of tweaks to the routing algorithm, which enhance both classification accuracy and interoperability. We evaluated C-CapsNet using the Oxford-IIIT Pet and SIIM-ACR Pneumothorax datasets, achieving mean F1 scores of 93% and 67%, respectively. These results demonstrate a significant performance improvement over traditional CapsNet and CNN models. The method’s effectiveness is further highlighted by its ability to produce clear and interpretable segmentation masks, which can be used to validate the network’s focus during classification tasks. Our findings suggest that C-CapsNet not only improves the accuracy of CapsNets but also enhances their explainability, making them more suitable for real-world applications, particularly in medical imaging.
Published: 2024
Full Text: View/download PDF

6. Brain Tumor Recognition Using Artificial Intelligence Neural-Networks (BRAIN): A Cost-Effective Clean-Energy Platform

Author: Muhammad S. Ghauri, Jen-Yeu Wang, Akshay J. Reddy, Talha Shabbir, Ethan Tabaie, and Javed Siddiqi
Subjects: deep learning, convolutional neural network, brain tumor, machine learning, image classification, Neurology. Diseases of the nervous system, RC346-429
Abstract: Brain tumors necessitate swift detection and classification for optimal patient outcomes. Deep learning has been extensively utilized to recognize complex tumor patterns in magnetic resonance imaging (MRI) images, aiding in tumor diagnosis, treatment, and prognostication. However, model complexity and limited generalizability with unfamiliar data hinder appropriate clinical integration. The objective of this study is to develop a clean-energy cloud-based deep learning platform to classify brain tumors. Three datasets of a total of 2611 axial MRI images were used to train our multi-layer convolutional neural network (CNN). Our platform automatically optimized every transfer learning and data augmentation feature combination to provide the highest predictive accuracy for our classification task. Our proposed system identified and classified brain tumors successfully and efficiently with an overall precision value of 96.8% [95% CI; 93.8–97.6]. Using clean energy supercomputing resources and cloud platforms cut our workflow to 103 min, $0 in total cost, and a negligible carbon footprint (0.0014 kg eq CO2). By leveraging automated optimized learning, we developed a cost-effective deep learning (DL) platform that accurately classified brain tumors from axial MRI images of different levels. Although studies have identified machine learning tools to overcome these obstacles, only some are cost-effective, generalizable, and usable regardless of experience.
Published: 2024
Full Text: View/download PDF

7. New Convolutional Neural Network and Graph Convolutional Network-Based Architecture for AI Applications in Alzheimer’s Disease and Dementia-Stage Classification

Author: Md Easin Hasan and Amy Wagler
Subjects: Alzheimer’s disease, image classification, transfer learning, convolutional neural networks, graph convolutional networks, Electronic computers. Computer science, QA75.5-76.95
Abstract: Neuroimaging experts in biotech industries can benefit from using cutting-edge artificial intelligence techniques for Alzheimer’s disease (AD)- and dementia-stage prediction, even though it is difficult to anticipate the precise stage of dementia and AD. Therefore, we propose a cutting-edge, computer-assisted method based on an advanced deep learning algorithm to differentiate between people with varying degrees of dementia, including healthy, very mild dementia, mild dementia, and moderate dementia classes. In this paper, four separate models were developed for classifying different dementia stages: convolutional neural networks (CNNs) built from scratch, pre-trained VGG16 with additional convolutional layers, graph convolutional networks (GCNs), and CNN-GCN models. The CNNs were implemented, and then the flattened layer output was fed to the GCN classifier, resulting in the proposed CNN-GCN architecture. A total of 6400 whole-brain magnetic resonance imaging scans were obtained from the Alzheimer’s Disease Neuroimaging Initiative database to train and evaluate the proposed methods. We applied the 5-fold cross-validation (CV) technique for all the models. We presented the results from the best fold out of the five folds in assessing the performance of the models developed in this study. Hence, for the best fold of the 5-fold CV, the above-mentioned models achieved an overall accuracy of 43.83%, 71.17%, 99.06%, and 100%, respectively. The CNN-GCN model, in particular, demonstrates excellent performance in classifying different stages of dementia. Understanding the stages of dementia can assist biotech industry researchers in uncovering molecular markers and pathways connected with each stage.
Published: 2024
Full Text: View/download PDF

8. Identifying Diabetic Retinopathy in the Human Eye: A Hybrid Approach Based on a Computer-Aided Diagnosis System Combined with Deep Learning

Author: Şükran Yaman Atcı, Ali Güneş, Metin Zontul, and Zafer Arslan
Subjects: diabetic retinopathy, image classification, object detection, computer-aided diagnosis, convolutional neural network (CNN), Computer applications to medicine. Medical informatics, R858-859.7
Abstract: Diagnosing and screening for diabetic retinopathy is a well-known issue in the biomedical field. A component of computer-aided diagnosis that has advanced significantly over the past few years as a result of the development and effectiveness of deep learning is the use of medical imagery from a patient’s eye to identify the damage caused to blood vessels. Issues with unbalanced datasets, incorrect annotations, a lack of sample images, and improper performance evaluation measures have negatively impacted the performance of deep learning models. Using three benchmark datasets of diabetic retinopathy, we conducted a detailed comparison study comparing various state-of-the-art approaches to address the effect caused by class imbalance, with precision scores of 93%, 89%, 81%, 76%, and 96%, respectively, for normal, mild, moderate, severe, and DR phases. The analyses of the hybrid modeling, including CNN analysis and SHAP model derivation results, are compared at the end of the paper, and ideal hybrid modeling strategies for deep learning classification models for automated DR detection are identified.
Published: 2024
Full Text: View/download PDF

9. Speech Emotion Recognition Using Transfer Learning: Integration of Advanced Speaker Embeddings and Image Recognition Models

Author: Maros Jakubec, Eva Lieskovska, Roman Jarina, Michal Spisiak, and Peter Kasak
Subjects: speech emotion recognition, IEMOCAP, CREMA-D, transfer learning, image classification, speaker embeddings, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Automatic Speech Emotion Recognition (SER) plays a vital role in making human–computer interactions more natural and effective. A significant challenge in SER development is the limited availability of diverse emotional speech datasets, which hinders the application of advanced deep learning models. Transfer learning is a machine learning technique that helps address this issue by utilizing knowledge from pre-trained models to improve performance on a new task in a target domain, even with limited data. This study investigates the use of transfer learning from various pre-trained networks, including speaker embedding models such as d-vector, x-vector, and r-vector, and image classification models like AlexNet, GoogLeNet, SqueezeNet, ResNet-18, and ResNet-50. We also propose enhanced versions of the x-vector and r-vector models incorporating Multi-Head Attention Pooling and Angular Margin Softmax, alongside other architectural improvements. Additionally, reverberation from the Room Impulse Response datasets was added to the speech utterances to diversify and augment the available data. Notably, the enhanced r-vector model achieved classification accuracies of 74.05% Unweighted Accuracy (UA) and 73.68% Weighted Accuracy (WA) on the IEMOCAP dataset, and 80.25% UA and 79.81% WA on the CREMA-D dataset, outperforming the existing state-of-the-art methods. This study shows that using cross-domain transfer learning is beneficial for low-resource emotion recognition. The enhanced models developed in other domains (for non-emotional tasks) can further improve the accuracy of SER.
Published: 2024
Full Text: View/download PDF

10. Lightweight Advanced Deep Neural Network (DNN) Model for Early-Stage Lung Cancer Detection

Author: Isha Bhatia, Aarti, Syed Immamul Ansarullah, Farhan Amin, and Amerah Alabrah
Subjects: deep learning, lung carcinoma, CT image, lung cancer, convolutional neural networks, image classification, Medicine (General), R5-920
Abstract: Background: Lung cancer, also known as lung carcinoma, has a high mortality rate; however, an early prediction helps to reduce the risk. In the current literature, various approaches have been developed for the prediction of lung carcinoma (at an early stage), but these still have various issues, such as low accuracy, high noise, low contrast, poor recognition rates, and a high false-positive rate, etc. Thus, in this research effort, we have proposed an advanced algorithm and combined two different types of deep neural networks to make it easier to spot lung melanoma in the early phases. Methods: We have used WDSI (weakly supervised dense instance-level lung segmentation) for laborious pixel-level annotations. In addition, we suggested an SS-CL (deep continuous learning-based deep neural network) that can be applied to the labeled and unlabeled data to improve efficiency. This work intends to evaluate potential lightweight, low-memory deep neural net (DNN) designs for image processing. Results: Our experimental results show that, by combining WDSI and LSO segmentation, we can achieve super-sensitive, specific, and accurate early detection of lung cancer. For experiments, we used the lung nodule (LUNA16) dataset, which consists of the patients’ 3D CT scan images. We confirmed that our proposed model is lightweight because it uses less memory. We have compared them with state-of-the-art models named PSNR and SSIM. The efficiency is 32.8% and 0.97, respectively. The proposed lightweight deep neural network (DNN) model archives a high accuracy of 98.2% and also removes noise more effectively. Conclusions: Our proposed approach has a lot of potential to help medical image analysis to help improve the accuracy of test results, and it may also prove helpful in saving patients’ lives.
Published: 2024
Full Text: View/download PDF

11. Deep Learning Approaches for the Assessment of Germinal Matrix Hemorrhage Using Neonatal Head Ultrasound

Author: Nehad M. Ibrahim, Hadeel Alanize, Lara Alqahtani, Lama J. Alqahtani, Raghad Alabssi, Wadha Alsindi, Haila Alabssi, Afnan AlMuhanna, and Hanadi Althani
Subjects: germinal matrix hemorrhage (GMH), cranial ultrasound imaging, deep learning, YOLOv8 model, image classification, neonatal care, Chemical technology, TP1-1185
Abstract: Germinal matrix hemorrhage (GMH) is a critical condition affecting premature infants, commonly diagnosed through cranial ultrasound imaging. This study presents an advanced deep learning approach for automated GMH grading using the YOLOv8 model. By analyzing a dataset of 586 infants, we classified ultrasound images into five distinct categories: Normal, Grade 1, Grade 2, Grade 3, and Grade 4. Utilizing transfer learning and data augmentation techniques, the YOLOv8 model achieved exceptional performance, with a mean average precision (mAP50) of 0.979 and a mAP50-95 of 0.724. These results indicate that the YOLOv8 model can significantly enhance the accuracy and efficiency of GMH diagnosis, providing a valuable tool to support radiologists in clinical settings.
Published: 2024
Full Text: View/download PDF

12. Assessing Land-Cover Changes in the Natural Park ‘Fragas do Eume’ over the Last 25 Years: Insights from Remote Sensing and Machine Learning

Author: Paula Díaz-García and Adrián Regos
Subjects: Atlantic deciduous forests, Galicia, landsat satellite mission, long-term analysis, image classification, artificial intelligence, Agriculture
Abstract: The ‘Fragas do Eume’ Natural Park includes one of the best-preserved Atlantic forests in Europe. These forests are part of the Natura 2000 Network. This scientific study focuses on analysing land-cover changes in the ‘Fragas do Eume’ Natural Park (NW Spain) over a 25-year period, from 1997 to 2022, using machine learning techniques for the classification of satellite images. Several image processing operations were carried out to correct radiometry, followed by supervised classification techniques with previously defined training areas. Five multispectral indices were used to improve classification accuracy, and their correlation was evaluated. Land-cover changes were analysed, with special attention to the transitions between eucalyptus plantations and native deciduous forests. A significant increase in eucalyptus plantations (48.2%) (Eucalyptus globulus Labill.) was observed, while native deciduous forests experienced a decrease in their extent (17.6%). This transformation of the landscape affected not only these two habitats, but also cropland and scrubland areas, both of which increased. Our results suggest that the lack of effective conservation policies and the economic interest of fast-growing tree plantations could explain the loss of native deciduous forests. The results highlight the need to implement pro-active and sustainable management measures to protect these natural forest ecosystems in the ‘Fragas do Eume’ Natural Park.
Published: 2024
Full Text: View/download PDF

13. Multi-View Soft Attention-Based Model for the Classification of Lung Cancer-Associated Disabilities

Author: Jannatul Ferdous Esha, Tahmidul Islam, Md. Appel Mahmud Pranto, Abrar Siam Borno, Nuruzzaman Faruqui, Mohammad Abu Yousuf, AKM Azad, Asmaa Soliman Al-Moisheer, Naif Alotaibi, Salem A. Alyami, and Mohammad Ali Moni
Subjects: disability research, lung cancer, attention mechanism, convolutional neural networks, image classification, Medicine (General), R5-920
Abstract: Background: The detection of lung nodules at their early stages may significantly enhance the survival rate and prevent progression to severe disability caused by advanced lung cancer, but it often requires manual and laborious efforts for radiologists, with limited success. To alleviate it, we propose a Multi-View Soft Attention-Based Convolutional Neural Network (MVSA-CNN) model for multi-class lung nodular classifications in three stages (benign, primary, and metastatic). Methods: Initially, patches from each nodule are extracted into three different views, each fed to our model to classify the malignancy. A dataset, namely the Lung Image Database Consortium Image Database Resource Initiative (LIDC-IDRI), is used for training and testing. The 10-fold cross-validation approach was used on the database to assess the model’s performance. Results: The experimental results suggest that MVSA-CNN outperforms other competing methods with 97.10% accuracy, 96.31% sensitivity, and 97.45% specificity. Conclusions: We hope the highly predictive performance of MVSA-CNN in lung nodule classification from lung Computed Tomography (CT) scans may facilitate more reliable diagnosis, thereby improving outcomes for individuals with disabilities who may experience disparities in healthcare access and quality.
Published: 2024
Full Text: View/download PDF

14. Where and Why Travelers Visit? Classifying Coastal Tourism Activities Using Geotagged Image Content from Social Media Data

Author: Gang Sun Kim, Choong-Ki Kim, and Woo-Kyun Lee
Subjects: coastal tourism management, geotagged social media, spatial data analysis, image classification, tourist behavior insights, data integration, Geography (General), G1-922
Abstract: Accurate information regarding the size, activity, and distribution of coastal tourists is essential for the effective management and planning of coastal tourism. In this study, geotagged photos uploaded to social network services were classified to identify coastal tourism activities. These activities were linked with spatial-scale data on tourist numbers estimated from social media data. To classify the activities, which included recreation, appreciation, education, and other activities, an image-supervised classification model was trained using 12,229 images, and the test accuracy was found to be 0.7244. On the Flickr platform, 43% of the image data located in the coastal land of South Korea are other activities, 39% are appreciation activities, and 18% are recreation and education activities. Other activities are mainly located in urban areas with a high population density and are spatially concentrated, while appreciation activities are mainly located in the natural environment and tend to be spatially spread out. Data on tourist activity categorization through content classification, combined with traditional tourist volume estimates, can help us understand previously overlooked information and context about a space.
Published: 2024
Full Text: View/download PDF

15. Gemini-Assisted Deep Learning Classification Model for Automated Diagnosis of High-Resolution Esophageal Manometry Images

Author: Stefan Lucian Popa, Teodora Surdea-Blaga, Dan Lucian Dumitrascu, Andrei Vasile Pop, Abdulrahman Ismaiel, Liliana David, Vlad Dumitru Brata, Daria Claudia Turtoi, Giuseppe Chiarioni, Edoardo Vincenzo Savarino, Imre Zsigmond, Zoltan Czako, and Daniel Corneliu Leucuta
Subjects: Gemini, deep learning, esophageal motility disorder diagnosis, image classification, artificial intelligence, HREM, Medicine (General), R5-920
Abstract: Background/Objectives: To develop a deep learning model for esophageal motility disorder diagnosis using high-resolution manometry images with the aid of Gemini. Methods: Gemini assisted in developing this model by aiding in code writing, preprocessing, model optimization, and troubleshooting. Results: The model demonstrated an overall precision of 0.89 on the testing set, with an accuracy of 0.88, a recall of 0.88, and an F1-score of 0.885. It presented better results for multiple categories, particularly in the panesophageal pressurization category, with precision = 0.99 and recall = 0.99, yielding a balanced F1-score of 0.99. Conclusions: This study demonstrates the potential of artificial intelligence, particularly Gemini, in aiding the creation of robust deep learning models for medical image analysis, solving not just simple binary classification problems but more complex, multi-class image classification tasks.
Published: 2024
Full Text: View/download PDF

16. Recognition of Urbanized Areas in UAV-Derived Very-High-Resolution Visible-Light Imagery

Author: Edyta Puniach, Wojciech Gruszczyński, Paweł Ćwiąkała, Katarzyna Strząbała, and Elżbieta Pastucha
Subjects: image classification, neural networks, unmanned aerial vehicle (UAV), vegetation indices, visible-light imagery, Science
Abstract: This study compared classifiers that differentiate between urbanized and non-urbanized areas based on unmanned aerial vehicle (UAV)-acquired RGB imagery. The tested solutions included numerous vegetation indices (VIs) thresholding and neural networks (NNs). The analysis was conducted for two study areas for which surveys were carried out using different UAVs and cameras. The ground sampling distances for the study areas were 10 mm and 15 mm, respectively. Reference classification was performed manually, obtaining approximately 24 million classified pixels for the first area and approximately 3.8 million for the second. This research study included an analysis of the impact of the season on the threshold values for the tested VIs and the impact of image patch size provided as inputs for the NNs on classification accuracy. The results of the conducted research study indicate a higher classification accuracy using NNs (about 96%) compared with the best of the tested VIs, i.e., Excess Blue (about 87%). Due to the highly imbalanced nature of the used datasets (non-urbanized areas constitute approximately 87% of the total datasets), the Matthews correlation coefficient was also used to assess the correctness of the classification. The analysis based on statistical measures was supplemented with a qualitative assessment of the classification results, which allowed the identification of the most important sources of differences in classification between VIs thresholding and NNs.
Published: 2024
Full Text: View/download PDF

17. Image Processing for Smart Agriculture Applications Using Cloud-Fog Computing

Author: Dušan Marković, Zoran Stamenković, Borislav Đorđević, and Siniša Ranđić
Subjects: image classification, cloud-fog computing, deep learning, agriculture application, Chemical technology, TP1-1185
Abstract: The widespread use of IoT devices has led to the generation of a huge amount of data and driven the need for analytical solutions in many areas of human activities, such as the field of smart agriculture. Continuous monitoring of crop growth stages enables timely interventions, such as control of weeds and plant diseases, as well as pest control, ensuring optimal development. Decision-making systems in smart agriculture involve image analysis with the potential to increase productivity, efficiency and sustainability. By applying Convolutional Neural Networks (CNNs), state recognition and classification can be performed based on images from specific locations. Thus, we have developed a solution for early problem detection and resource management optimization. The main concept of the proposed solution relies on a direct connection between Cloud and Edge devices, which is achieved through Fog computing. The goal of our work is creation of a deep learning model for image classification that can be optimized and adapted for implementation on devices with limited hardware resources at the level of Fog computing. This could increase the importance of image processing in the reduction of agricultural operating costs and manual labor. As a result of the off-load data processing at Edge and Fog devices, the system responsiveness can be improved, the costs associated with data transmission and storage can be reduced, and the overall system reliability and security can be increased. The proposed solution can choose classification algorithms to find a trade-off between size and accuracy of the model optimized for devices with limited hardware resources. After testing our model for tomato disease classification compiled for execution on FPGA, it was found that the decrease in test accuracy is as small as 0.83% (from 96.29% to 95.46%).
Published: 2024
Full Text: View/download PDF

18. A Collaborative Federated Learning Framework for Lung and Colon Cancer Classifications

Author: Md. Munawar Hossain, Md. Robiul Islam, Md. Faysal Ahamed, Mominul Ahsan, and Julfikar Haider
Subjects: lung cancer, colon cancer, histopathological image analysis, image classification, decentralized machine learning, federated learning, Technology
Abstract: Lung and colon cancers are common types of cancer with significant fatality rates. Early identification considerably improves the odds of survival for those suffering from these diseases. Histopathological image analysis is crucial for detecting cancer by identifying morphological anomalies in tissue samples. Regulations such as the HIPAA and GDPR impose considerable restrictions on the sharing of sensitive patient data, mostly because of privacy concerns. Federated learning (FL) is a promising technique that allows the training of strong models while maintaining data privacy. The use of a federated learning strategy has been suggested in this study to address privacy concerns in cancer categorization. To classify histopathological images of lung and colon cancers, this methodology uses local models with an Inception-V3 backbone. The global model is then updated on the basis of the local weights. The images were obtained from the LC25000 dataset, which consists of five separate classes. Separate analyses were performed for lung cancer, colon cancer, and their combined classification. The implemented model successfully classified lung cancer images into three separate classes with a classification accuracy of 99.867%. The classification of colon cancer images was achieved with 100% accuracy. More significantly, for the lung and colon cancers combined, the accuracy reached an impressive 99.720%. Compared with other current approaches, the proposed framework showed an improved performance. A heatmap, visual saliency map, and GradCAM were generated to pinpoint the crucial areas in the histopathology pictures of the test set where the models focused in particular during cancer class predictions. This approach demonstrates the potential of federated learning to enhance collaborative efforts in automated disease diagnosis through medical image analysis while ensuring patient data privacy.
Published: 2024
Full Text: View/download PDF

19. Research on the Wild Mushroom Recognition Method Based on Transformer and the Multi-Scale Feature Fusion Compact Bilinear Neural Network

Author: He Liu, Qingran Hu, and Dongyan Huang
Subjects: image classification, vision transformer, multi-scale feature fusion, compact bilinear pooling, attention mechanism, fine-grained, Agriculture (General), S1-972
Abstract: Wild mushrooms are popular for their taste and nutritional value; however, non-experts often struggle to distinguish between toxic and non-toxic species when foraging in the wild, potentially leading to poisoning incidents. To address this issue, this study proposes a compact bilinear neural network method based on Transformer and multi-scale feature fusion. The method utilizes a dual-stream structure that integrates multiple feature extractors, enhancing the comprehensiveness of image information capture. Additionally, bottleneck attention and efficient multi-scale attention modules are embedded to effectively capture multi-scale features while maintaining low computational costs. By employing a compact bilinear pooling module, the model achieves high-order feature interactions, reducing the number of parameters without compromising performance. Experimental results demonstrate that the proposed method achieves an accuracy of 98.03%, outperforming existing comparative methods. This proves the superior recognition performance of the model, making it more reliable in distinguishing wild mushrooms while capturing key information from multiple dimensions, enabling it to better handle complex scenarios. Furthermore, the development of public-facing identification tools based on this method could help reduce the risk of poisoning incidents. Building on these findings, the study suggests strengthening the research and development of digital agricultural technologies, promoting the application of intelligent recognition technologies in agriculture, and providing technical support for agricultural production and resource management through digital platforms. This would provide a theoretical foundation for the innovation of digital agriculture and promote its sustainable development.
Published: 2024
Full Text: View/download PDF

20. Brain Tumor Detection Using Magnetic Resonance Imaging and Convolutional Neural Networks

Author: Rafael Martínez-Del-Río-Ortega, Javier Civit-Masot, Francisco Luna-Perejón, and Manuel Domínguez-Morales
Subjects: brain tumors, MRI, convolutional neural networks, deep learning, image classification, medical imaging, Technology
Abstract: Early and precise detection of brain tumors is critical for improving clinical outcomes and patient quality of life. This research focused on developing an image classifier using convolutional neural networks (CNN) to detect brain tumors in magnetic resonance imaging (MRI). Brain tumors are a significant cause of morbidity and mortality worldwide, with approximately 300,000 new cases diagnosed annually. Magnetic resonance imaging (MRI) offers excellent spatial resolution and soft tissue contrast, making it indispensable for identifying brain abnormalities. However, accurate interpretation of MRI scans remains challenging, due to human subjectivity and variability in tumor appearance. This study employed CNNs, which have demonstrated exceptional performance in medical image analysis, to address these challenges. Various CNN architectures were implemented and evaluated to optimize brain tumor detection. The best model achieved an accuracy of 97.5%, sensitivity of 99.2%, and binary accuracy of 98.2%, surpassing previous studies. These results underscore the potential of deep learning techniques in clinical applications, significantly enhancing diagnostic accuracy and reliability.
Published: 2024
Full Text: View/download PDF

21. On the Impact of Discrete Atomic Compression on Image Classification by Convolutional Neural Networks

Author: Viktor Makarichev, Vladimir Lukin, and Iryna Brysina
Subjects: lossy image compression, image classification, discrete atomic compression, convolutional neural network, Electronic computers. Computer science, QA75.5-76.95
Abstract: Digital images play a particular role in a wide range of systems. Image processing, storing and transferring via networks require a lot of memory, time and traffic. Also, appropriate protection is required in the case of confidential data. Discrete atomic compression (DAC) is an approach providing image compression and encryption simultaneously. It has two processing modes: lossless and lossy. The latter one ensures a higher compression ratio in combination with inevitable quality loss that may affect decompressed image analysis, in particular, classification. In this paper, we explore the impact of distortions produced by DAC on performance of several state-of-the-art classifiers based on convolutional neural networks (CNNs). The classic, block-splitting and chroma subsampling modes of DAC are considered. It is shown that each of them produces a quite small effect on MobileNetV2, VGG16, VGG19, ResNet50, NASNetMobile and NASNetLarge models. This research shows that, using the DAC approach, memory expenses can be reduced without significant degradation of performance of the aforementioned CNN-based classifiers.
Published: 2024
Full Text: View/download PDF

22. SpemNet: A Cotton Disease and Pest Identification Method Based on Efficient Multi-Scale Attention and Stacking Patch Embedding

Author: Keyuan Qiu, Yingjie Zhang, Zekai Ren, Meng Li, Qian Wang, Yiqiang Feng, and Feng Chen
Subjects: cotton pest recognition, image classification, attention mechanism, transformer, efficient multi-scale attention, feature fusion, Science
Abstract: We propose a cotton pest and disease recognition method, SpemNet, based on efficient multi-scale attention and stacking patch embedding. By introducing the SPE module and the EMA module, we successfully solve the problems of local feature learning difficulty and insufficient multi-scale feature integration in the traditional Vision Transformer model, which significantly improve the performance and efficiency of the model. In our experiments, we comprehensively validate the SpemNet model on the CottonInsect dataset, and the results show that SpemNet performs well in the cotton pest recognition task, with significant effectiveness and superiority. The SpemNet model excels in key metrics such as precision and F1 score, demonstrating significant potential and superiority in the cotton pest and disease recognition task. This study provides an efficient and reliable solution in the field of cotton pest and disease identification, which is of great theoretical and applied significance.
Published: 2024
Full Text: View/download PDF

23. Research on Non-Destructive Quality Detection of Sunflower Seeds Based on Terahertz Imaging Technology

Author: Hongyi Ge, Chunyan Guo, Yuying Jiang, Yuan Zhang, Wenhui Zhou, and Heng Wang
Subjects: terahertz images, image classification, MobileViT-E, broken grains, deformed grains, Chemical technology, TP1-1185
Abstract: The variety and content of high-quality proteins in sunflower seeds are higher than those in other cereals. However, sunflower seeds can suffer from abnormalities, such as breakage and deformity, during planting and harvesting, which hinder the development of the sunflower seed industry. Traditional methods such as manual sensory and machine sorting are highly subjective and cannot detect the internal characteristics of sunflower seeds. The development of spectral imaging technology has facilitated the application of terahertz waves in the quality inspection of sunflower seeds, owing to its advantages of non-destructive penetration and fast imaging. This paper proposes a novel terahertz image classification model, MobileViT-E, which is trained and validated on a self-constructed dataset of sunflower seeds. The results show that the overall recognition accuracy of the proposed model can reach 96.30%, which is 4.85%, 3%, 7.84% and 1.86% higher than those of the ResNet-50, EfficientNeT, MobileOne and MobileViT models, respectively. At the same time, the performance indices such as the recognition accuracy, the recall and the F1-score values are also effectively improved. Therefore, the MobileViT-E model proposed in this study can improve the classification and identification of normal, damaged and deformed sunflower seeds, and provide technical support for the non-destructive detection of sunflower seed quality.
Published: 2024
Full Text: View/download PDF

24. Development and Application of Unmanned Aerial High-Resolution Convex Grating Dispersion Hyperspectral Imager

Author: Qingsheng Xue, Xinyu Gao, Fengqin Lu, Jun Ma, Junhong Song, and Jinfeng Xu
Subjects: optical design, hyperspectral imager, convex grating, hyperspectral remote sensing, image classification, Chemical technology, TP1-1185
Abstract: This study presents the design and development of a high-resolution convex grating dispersion hyperspectral imaging system tailored for unmanned aerial vehicle (UAV) remote sensing applications. The system operates within a spectral range of 400 to 1000 nm, encompassing over 150 channels, and achieves an average spectral resolution of less than 4 nm. It features a field of view of 30°, a focal length of 20 mm, a compact volume of only 200 mm × 167 mm × 78 mm, and a total weight of less than 1.5 kg. Based on the design specifications, the system was meticulously adjusted, calibrated, and tested. Additionally, custom software for the hyperspectral system was independently developed to facilitate functions such as control parameter adjustments, real-time display, and data preprocessing of the hyperspectral camera. Subsequently, the prototype was integrated onto a drone for remote sensing observations of Spartina alterniflora at Yangkou Beach in Shouguang City, Shandong Province. Various algorithms were employed for data classification and comparison, with support vector machine (SVM) and neural network algorithms demonstrating superior classification accuracy. The experimental results indicate that the UAV-based hyperspectral imaging system exhibits high imaging quality, minimal distortion, excellent resolution, an expansive camera field of view, a broad detection range, high experimental efficiency, and remarkable capabilities for remote sensing detection.
Published: 2024
Full Text: View/download PDF

25. Detecting Adversarial Examples Using Surrogate Models

Author: Borna Feldsar, Rudolf Mayer, and Andreas Rauber
Subjects: machine learning, adversarial examples, detection, surrogate model, convolutional neural networks, image classification, Computer engineering. Computer hardware, TK7885-7895
Abstract: Deep Learning has enabled significant progress towards more accurate predictions and is increasingly integrated into our everyday lives in real-world applications; this is true especially for Convolutional Neural Networks (CNNs) in the field of image analysis. Nevertheless, it has been shown that Deep Learning is vulnerable against well-crafted, small perturbations to the input, i.e., adversarial examples. Defending against such attacks is therefore crucial to ensure the proper functioning of these models—especially when autonomous decisions are taken in safety-critical applications, such as autonomous vehicles. In this work, shallow machine learning models, such as Logistic Regression and Support Vector Machine, are utilised as surrogates of a CNN based on the assumption that they would be differently affected by the minute modifications crafted for CNNs. We develop three detection strategies for adversarial examples by analysing differences in the prediction of the surrogate and the CNN model: namely, deviation in (i) the prediction, (ii) the distance of the predictions, and (iii) the confidence of the predictions. We consider three different feature spaces: raw images, extracted features, and the activations of the CNN model. Our evaluation shows that our methods achieve state-of-the-art performance compared to other approaches, such as Feature Squeezing, MagNet, PixelDefend, and Subset Scanning, on the MNIST, Fashion-MNIST, and CIFAR-10 datasets while being robust in the sense that they do not entirely fail against selected single attacks. Further, we evaluate our defence against an adaptive attacker in a grey-box setting.
Published: 2023
Full Text: View/download PDF

26. Spectral Patterns of Pixels and Objects of the Forest Phytophysiognomies in the Anauá National Forest, Roraima State, Brazil

Author: Tiago Monteiro Condé, Niro Higuchi, Adriano José Nogueira Lima, Moacir Alberto Assis Campos, Jackelin Dias Condé, André Camargo de Oliveira, and Dirceu Lucio Carneiro de Miranda
Subjects: Amazon, GEOBIA, image classification, image segmentation, Landsat 8, Ecology, QH540-549.5
Abstract: Forest phytophysiognomies have specific spatial patterns that can be mapped or translated into spectral patterns of vegetation. Regions of spectral similarity can be classified by reference to color, tonality or intensity of brightness, reflectance, texture, size, shape, neighborhood influence, etc. We evaluated the power of accuracy of supervised classification algorithms via per-pixel (maximum likelihood) and geographic object-based image analysis (GEOBIA) for distinguishing spectral patterns of the vegetation in the northern Brazilian Amazon. A total of 280 training samples (70%) and 120 validation samples (30%) of each of the 11 vegetation cover and land-use classes (N = 4400) were classified based on differences in their visible (RGB), near-infrared (NIR), and medium infrared (SWIR 1 or MIR) Landsat 8 (OLI) bands. Classification by pixels achieved a greater accuracy (Kappa = 0.75%) than GEOBIA (Kappa = 0.72%). GEOBIA, however, offers a greater plasticity and the possibility of calibrating the spectral rules associated with vegetation indices and spatial parameters. We conclude that both methods enabled precision spectral separations (0.45–1.65 μm), contributing to the distinctions between forest phytophysiognomies and land uses—strategic factors in the planning and management of natural resources in protected areas in the Amazon region.
Published: 2023
Full Text: View/download PDF

27. Deep Learning-Based Methods for Multi-Class Rice Disease Detection Using Plant Images

Author: Yuhai Li, Xiaoyan Chen, Lina Yin, and Yue Hu
Subjects: rice diseases, multi-category, image classification, transfer learning, RegNet, Agriculture
Abstract: Rapid and accurate diagnosis of rice diseases can prevent large-scale outbreaks and reduce pesticide overuse, thereby ensuring rice yield and quality. Existing research typically focuses on a limited number of rice diseases, which makes these studies less applicable to the diverse range of diseases currently affecting rice. Consequently, these studies fail to meet the detection needs of agricultural workers. Additionally, the lack of discussion regarding advanced detection algorithms in current research makes it difficult to determine the optimal application solution. To address these limitations, this study constructs a multi-class rice disease dataset comprising eleven rice diseases and one healthy leaf class. The resulting model is more widely applicable to a variety of diseases. Additionally, we evaluated advanced detection networks and found that DenseNet emerged as the best-performing model with an accuracy of 95.7%, precision of 95.3%, recall of 94.8%, F1 score of 95.0%, and a parameter count of only 6.97 M. Considering the current interest in transfer learning, this study introduced pre-trained weights from the large-scale, multi-class ImageNet dataset into the experiments. Among the tested models, RegNet achieved the best comprehensive performance, with an accuracy of 96.8%, precision of 96.2%, recall of 95.9%, F1 score of 96.0%, and a parameter count of only 3.91 M. Based on the transfer learning-based RegNet model, we developed a rice disease identification app that provides a simple and efficient diagnosis of rice diseases.
Published: 2024
Full Text: View/download PDF

28. An Efficient Detection of the Pitaya Growth Status Based on the YOLOv8n-CBN Model

Author: Zhi Qiu, Shiyue Zhuo, Mingyan Li, Fei Huang, Deyun Mo, Xuejun Tian, and Xinyuan Tian
Subjects: YOLOv8n-CBN, pitaya, growth state, image classification, Plant culture, SB1-1110
Abstract: The pitaya is a common fruit in southern China, but the growing environment of pitayas is complex, with a high density of foliage. This intricate natural environment is a significant contributing factor to misidentification and omission in the detection of the growing state of pitayas. In this paper, the growth states of pitayas are classified into three categories: flowering, immature, and mature. In order to reduce the misidentification and omission in the recognition process, we propose a detection model based on an improvement of the network structure of YOLOv8, namely YOLOv8n-CBN. The YOLOv8n-CBN model is based on the YOLOv8n network structure, with the incorporation of a CBAM attention mechanism module, a bidirectional feature pyramid network (BiFPN), and a C2PFN integration. Additionally, the C2F module has been replaced by a C2F_DCN module containing a deformable convolution (DCNv2). The experimental results demonstrate that YOLOv8n-CBN has enhanced the precision, recall, and mean average precision of the YOLOv8n model with an IoU threshold of 0.5. The model demonstrates a 91.1% accuracy, a 3.1% improvement over the original model, and an F1 score of 87.6%, a 3.4% enhancement over the original model. In comparison to YOLOv3-tiny, YOLOv5s, and YOLOv5m, which are highly effective target detection models, the mAP@0.50–0.95 of our proposed YOLOv8n-CBN is observed to be 10.1%, 5.0%, and 1.6% higher, respectively. This demonstrates that YOLOv8n-CBN is capable of more accurately identifying and detecting the growth status of pitaya in a natural environment.
Published: 2024
Full Text: View/download PDF

29. A Modified MobileNetV3 Model Using an Attention Mechanism for Eight-Class Classification of Breast Cancer Pathological Images

Author: Chang Guo, Qingjian Zhou, Jia Jiao, Qingyang Li, and Lin Zhu
Subjects: MobileNetV3, breast cancer, image classification, attention mechanism, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Addressing the challenge of achieving precise subtype classification of breast cancer histopathology images with limited resources, a lightweight model incorporating multi-stage information fusion and an attention mechanism is proposed for this task. Using MobileNetV3 as the backbone, a multi-stage fusion strategy captures the rich image information in breast cancer histopathology images. Additionally, the selective kernel (SK) attention mechanism is introduced in the initial stages of feature extraction, while an improved squeeze-and-excitation coordinate attention (SCA) mechanism is integrated in the later stages to enhance the extraction of both underlying and semantic features. The final feature representations for subtype classification are determined based on the attention map weights computed at each stage. The experimental results demonstrate the model’s outstanding recognition performance on the BreakHis dataset, achieving subtype classification accuracies of 96.259%, 94.763%, 95.511%, and 94.015% at four different magnifications.
Published: 2024
Full Text: View/download PDF

30. Artificial Intelligence-Based Applications for Bone Fracture Detection Using Medical Images: A Systematic Review

Author: Mohammed Kutbi
Subjects: bone fracture, image classification, medical images, Medicine (General), R5-920
Abstract: Artificial intelligence (AI) is making notable advancements in the medical field, particularly in bone fracture detection. This systematic review compiles and assesses existing research on AI applications aimed at identifying bone fractures through medical imaging, encompassing studies from 2010 to 2023. It evaluates the performance of various AI models, such as convolutional neural networks (CNNs), in diagnosing bone fractures, highlighting their superior accuracy, sensitivity, and specificity compared to traditional diagnostic methods. Furthermore, the review explores the integration of advanced imaging techniques like 3D CT and MRI with AI algorithms, which has led to enhanced diagnostic accuracy and improved patient outcomes. The potential of Generative AI and Large Language Models (LLMs), such as OpenAI’s GPT, to enhance diagnostic processes through synthetic data generation, comprehensive report creation, and clinical scenario simulation is also discussed. The review underscores the transformative impact of AI on diagnostic workflows and patient care, while also identifying research gaps and suggesting future research directions to enhance data quality, model robustness, and ethical considerations.
Published: 2024
Full Text: View/download PDF

31. A Method for Enhancing the Accuracy of Pet Breeds Identification Model in Complex Environments

Author: Zhonglan Lin, Haiying Xia, Yan Liu, Yunbai Qin, and Cong Wang
Subjects: pet breeds identification, complex background, transfer learning, image classification, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Most existing studies on pet breeds classification focus on images with simple backgrounds, leading to the unsatisfactory performance of models in practical applications. This paper investigates training pet breeds classification models using complex images and constructs a dataset for identifying breeds of pet cats and dogs. We use this dataset to fine-tune three SOTA models: ResNet34, DenseNet121, and Swin Transformer. Specifically, in terms of top-1 accuracy, the performance of DenseNet is improved from 89.10% to 89.19%, while that of the Swin Transformer is increased by 1.26%, marking the most significant enhancement. The results show that training with our dataset significantly enhances the models’ classification capabilities in complex environments. Additionally, we offer a lightweight pet breeds identification model based on PBI-EdgeNeXt (Pet Breeds Identification EdgeNeXt). We utilizes the PolyLoss function and Sophia optimizer for model training. Furthermore, we compare our model with five commonly used lightweight models and find that the proposed model achieves the highest top-1 accuracy of 87.12%. These results demonstrate that the model achieves high accuracy, reaching the SOTA level.
Published: 2024
Full Text: View/download PDF

32. Complex-Valued 2D-3D Hybrid Convolutional Neural Network with Attention Mechanism for PolSAR Image Classification

Author: Wenmei Li, Hao Xia, Jiadong Zhang, Yu Wang, Yan Jia, and Yuhong He
Subjects: polarimetric synthetic aperture radar (PolSAR), image classification, complex-valued convolutional neural network (CV-CNN), attention mechanism, Science
Abstract: The recently introduced complex-valued convolutional neural network (CV-CNN) has shown considerable advancements for polarimetric synthetic aperture radar (PolSAR) image classification by effectively incorporating both magnitude and phase information. However, a solitary 2D or 3D CNN encounters challenges such as insufficiently extracting scattering channel dimension features or excessive computational parameters. Moreover, these networks’ default is that all information is equally important, consuming vast resources for processing useless information. To address these issues, this study presents a new hybrid CV-CNN with the attention mechanism (CV-2D/3D-CNN-AM) to classify PolSAR ground objects, possessing both excellent computational efficiency and feature extraction capability. In the proposed framework, multi-level discriminative features are extracted from preprocessed data through hybrid networks in the complex domain, along with a special attention block to filter the feature importance from both spatial and channel dimensions. Experimental results performed on three PolSAR datasets demonstrate our present approach’s superiority over other existing ones. Furthermore, ablation experiments confirm the validity of each module, highlighting our model’s robustness and effectiveness.
Published: 2024
Full Text: View/download PDF

33. Comparative Analysis of Machine Learning Techniques and Data Sources for Dead Tree Detection: What Is the Best Way to Go?

Author: Júlia Matejčíková, Dana Vébrová, and Peter Surový
Subjects: image classification, multispectral images, dead trees, Science
Abstract: In Central Europe, the extent of bark beetle infestation in spruce stands due to prolonged high temperatures and drought has created large areas of dead trees, which are difficult to monitor by ground surveys. Remote sensing is the only possibility for the assessment of the extent of the dead tree areas. Several options exist for mapping individual dead trees, including different sources and different processing techniques. Satellite images, aerial images, and images from UAVs can be used as sources. Machine and deep learning techniques are included in the processing techniques, although models are often presented without proper realistic validation.This paper compares methods of monitoring dead tree areas using three data sources: multispectral aerial imagery, multispectral PlanetScope satellite imagery, and multispectral Sentinel-2 imagery, as well as two processing methods. The classification methods used are Random Forest (RF) and neural network (NN) in two modalities: pixel- and object-based. In total, 12 combinations are presented. The results were evaluated using two types of reference data: accuracy of model on validation data and accuracy on vector-format semi-automatic classification polygons created by a human evaluator, referred to as real Ground Truth. The aerial imagery was found to have the highest model accuracy, with the CNN model achieving up to 98% with object classification. A higher classification accuracy for satellite imagery was achieved by combining pixel classification and the RF model (87% accuracy for Sentinel-2). For PlanetScope Imagery, the best result was 89%, using a combination of CNN and object-based classifications. A comparison with the Ground Truth showed a decrease in the classification accuracy of the aerial imagery to 89% and the classification accuracy of the satellite imagery to around 70%. In conclusion, aerial imagery is the most effective tool for monitoring bark beetle calamity in terms of precision and accuracy, but satellite imagery has the advantage of fast availability and shorter data processing time, together with larger coverage areas.
Published: 2024
Full Text: View/download PDF

34. AquaVision: AI-Powered Marine Species Identification

Author: Benjamin Mifsud Scicluna, Adam Gauci, and Alan Deidun
Subjects: image classification, machine learning, convolution neural networks, citizen science, Mediterranean basin, invasive alien species, Information technology, T58.5-58.64
Abstract: This study addresses the challenge of accurately identifying fish species by using machine learning and image classification techniques. The primary aim is to develop an innovative algorithm that can dynamically identify the most common (within Maltese coastal waters) invasive Mediterranean fish species based on available images. In particular, these include Fistularia commersonii, Lobotes surinamensis, Pomadasys incisus, Siganus luridus, and Stephanolepis diaspros, which have been adopted as this study’s target species. Through the use of machine-learning models and transfer learning, the proposed solution seeks to enable precise, on-the-spot species recognition. The methodology involved collecting and organising images as well as training the models with consistent datasets to ensure comparable results. After trying a number of models, ResNet18 was found to be the most accurate and reliable, with YOLO v8 following closely behind. While the performance of YOLO was reasonably good, it exhibited less consistency in its results. These results underline the potential of the developed algorithm to significantly aid marine biology research, including citizen science initiatives, and promote environmental management efforts through accurate fish species identification.
Published: 2024
Full Text: View/download PDF

35. A Symmetric Efficient Spatial and Channel Attention (ESCA) Module Based on Convolutional Neural Networks

Author: Huaiyu Liu, Yueyuan Zhang, and Yiyang Chen
Subjects: deep learning, attention mechanisms, symmetric, computer vision, image classification, object detection, Mathematics, QA1-939
Abstract: In recent years, attention mechanisms have shown great potential in various computer vision tasks. However, most existing methods focus on developing more complex attention modules for better performance, which inevitably increases the complexity of the model. To overcome performance and complexity tradeoffs, this paper proposes efficient spatial and channel attention (ESCA), a symmetric, comprehensive, and efficient attention module. By analyzing squeeze-and-excitation (SE), convolutional block attention module (CBAM), coordinate attention (CA), and efficient channel attention (ECA) modules, we abandon the dimension-reduction operation of SE module, verify the negative impact of global max pooling (GMP) on the model, and apply a local cross-channel interaction strategy without dimension reduction to learn attention. We not only care about the channel features of the image, we also care about the spatial location of the target on the image, and we take into account the effectiveness of channel attention, so we designed the symmetric ESCA module. The ESCA module is effective, as demonstrated by its application in the ResNet-50 classification benchmark. With 26.26 M parameters and 8.545 G FLOPs, it introduces a mere 0.14% increment in FLOPs while achieving over 6.33% improvement in Top-1 accuracy and exceeding 3.25% gain in Top-5 accuracy. We perform image classification and object detection tasks on ResNet, MobileNet, YOLO, and other architectures on popular datasets such as Mini ImageNet, CIFAR-10, and VOC 2007. Experiments show that ESCA can achieve great improvement in model accuracy at a very small cost, and it performs well among similar models.
Published: 2024
Full Text: View/download PDF

36. Automatic Quality Assessment of Pork Belly via Deep Learning and Ultrasound Imaging

Author: Tianshuo Wang, Huan Yang, Chunlei Zhang, Xiaohuan Chao, Mingzheng Liu, Jiahao Chen, Shuhan Liu, and Bo Zhou
Subjects: B-ultrasound imaging, deep learning, image classification, pork belly quality, real-time recognition, Veterinary medicine, SF600-1100, Zoology, QL1-991
Abstract: Pork belly, prized for its unique flavor and texture, is often overlooked in breeding programs that prioritize lean meat production. The quality of pork belly is determined by the number and distribution of muscle and fat layers. This study aimed to assess the number of pork belly layers using deep learning techniques. Initially, semantic segmentation was considered, but the intersection over union (IoU) scores for the segmented parts were below 70%, which is insufficient for practical application. Consequently, the focus shifted to image classification methods. Based on the number of fat and muscle layers, a dataset was categorized into three groups: three layers (n = 1811), five layers (n = 1294), and seven layers (n = 879). Drawing upon established model architectures, the initial model was refined for the task of learning and predicting layer traits from B-ultrasound images of pork belly. After a thorough evaluation of various performance metrics, the ResNet18 model emerged as the most effective, achieving a remarkable training set accuracy of 99.99% and a validation set accuracy of 96.22%, with corresponding loss values of 0.1478 and 0.1976. The robustness of the model was confirmed through three interpretable analysis methods, including grad-CAM, ensuring its reliability. Furthermore, the model was successfully deployed in a local setting to process B-ultrasound video frames in real time, consistently identifying the pork belly layer count with a confidence level exceeding 70%. By employing a scoring system with 100 points as the threshold, the number of pork belly layers in vivo was categorized into superior and inferior grades. This innovative system offers immediate decision-making support for breeding determinations and presents a highly efficient and precise method for assessment of pork belly layers.
Published: 2024
Full Text: View/download PDF

37. Background-Filtering Feature-Enhanced Graph Neural Networks for Few-Shot Learning

Author: Binbin Wang, Yuemao Wang, and Yaoqun Xu
Subjects: few-shot learning, graph neural network (GNN), background filtering, feature enhancement, image classification, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: The fundamental idea behind few-shot learning is to employ sparse labeled data to effectively handle novel tasks, whereas most existing mainstream approaches mostly rely on prior experience gained from previous situations. Nonetheless, effective knowledge transfer is sometimes hampered by constraints on new samples and barriers between classes. To address these issues, this research presents a novel background-filtering feature-enhanced graph network (BFFE-GNN) that attempts to generate relationships between graphs in order to explicitly describe and transmit inter-class relationships. To specifically address the issue of inadequate information utilization brought on by sample background interference, which is frequent in classification tasks, we employ a novel background-filtering feature-enhanced graph network. Effective data information extraction from complicated datasets is challenging due to the original module’s relatively simple network structure during the feature extraction stage and the interference of the image background. The background-filtering module was specifically introduced, which enhances spatial attention. This not only improves the quality of feature extraction but also effectively lessens the influence of picture background on classification results. In addition, we have improved the background-filtering-based feature gap calculation by implementing a feature-enhanced module. To demonstrate the adaptability of the BFFE-GNN model, we not only ran experiments on two publicly available datasets, MiniImagenet and TiredImagenet, but we also created our own Tool dataset. The method’s exceptional performance and universal applicability in the field of few-shot picture classification are clearly demonstrated by the experimental findings, which indicate that it greatly outperforms the majority of existing similar methods. This discovery establishes a strong basis for further research in the field of few-shot learning in addition to offering fresh insights into the subject.
Published: 2024
Full Text: View/download PDF

38. A Noisy Sample Selection Framework Based on a Mixup Loss and Recalibration Strategy

Author: Qian Zhang, De Yu, Xinru Zhou, Hanmeng Gong, Zheng Li, Yiming Liu, and Ruirui Shao
Subjects: deep neural networks, noisy labels, semi-supervised learning, image classification, Mathematics, QA1-939
Abstract: Deep neural networks (DNNs) have achieved breakthrough progress in various fields, largely owing to the support of large-scale datasets with manually annotated labels. However, obtaining such datasets is costly and time-consuming, making high-quality annotation a challenging task. In this work, we propose an improved noisy sample selection method, termed “sample selection framework”, based on a mixup loss and recalibration strategy (SMR). This framework enhances the robustness and generalization abilities of models. First, we introduce a robust mixup loss function to pre-train two models with identical structures separately. This approach avoids additional hyperparameter adjustments and reduces the need for prior knowledge of noise types. Additionally, we use a Gaussian Mixture Model (GMM) to divide the entire training set into labeled and unlabeled subsets, followed by robust training using semi-supervised learning (SSL) techniques. Furthermore, we propose a recalibration strategy based on cross-entropy (CE) loss to prevent the models from converging to local optima during the SSL process, thus further improving performance. Ablation experiments on CIFAR-10 with 50% symmetric noise and 40% asymmetric noise demonstrate that the two modules introduced in this paper improve the accuracy of the baseline (i.e., DivideMix) by 1.5% and 0.5%, respectively. Moreover, the experimental results on multiple benchmark datasets demonstrate that our proposed method effectively mitigates the impact of noisy labels and significantly enhances the performance of DNNs on noisy datasets. For instance, on the WebVision dataset, our method improves the top-1 accuracy by 0.7% and 2.4% compared to the baseline method.
Published: 2024
Full Text: View/download PDF

39. Industry Image Classification Based on Stochastic Configuration Networks and Multi-Scale Feature Analysis

Author: Qinxia Wang, Dandan Liu, Hao Tian, Yongpeng Qin, and Difei Zhao
Subjects: image classification, multi-scale analysis, stochastic configuration networks, feature extraction, Chemical technology, TP1-1185
Abstract: For industry image data, this paper proposes an image classification method based on stochastic configuration networks and multi-scale feature extraction. The multi-scale features are extracted from images of different scales using deep 2DSCN, and the hidden features of multiple layers are also connected together to obtain more informational features. The integrated features are fed into SCNs to learn a classifier which improves the recognition rate for different categories. In the experiments, a handwritten digit database and an industry hot-rolled steel strip database are used, and the comparison results demonstrate the proposed method can effectively improve the classification accuracy.
Published: 2024
Full Text: View/download PDF

40. Dune Morphology Classification and Dataset Construction Method Based on Unmanned Aerial Vehicle Orthoimagery

Author: Ming Li, Zekun Yang, Jiehua Yan, Haoran Li, and Wangzhong Ye
Subjects: UAV orthoimagery, dune classification, dataset, convolutional neural network, image classification, Chemical technology, TP1-1185
Abstract: Dunes are the primary geomorphological type in deserts, and the distribution of dune morphologies is of significant importance for studying regional characteristics, formation mechanisms, and evolutionary processes. Traditional dune morphology classification methods rely on visual interpretation by humans, which is not only time-consuming and inefficient but also subjective in classification judgment. These issues have impeded the intelligent development of dune morphology classification. However, convolutional neural network (CNN) models exhibit robust feature representation capabilities for images and have achieved excellent results in image classification, providing a new method for studying dune morphology classification. Therefore, this paper summarizes five typical dune morphologies in the deserts of western Inner Mongolia, which can be used to define and describe most of the dune types in Chinese deserts. Subsequently, field surveys and the experimental collection of unmanned aerial vehicle (UAV) orthoimages for different dune types were conducted. Five different types of dune morphology datasets were constructed through manual segmentation, automatic rule segmentation, random screening, and data augmentation. Finally, the classification of dune morphologies and the exploration of dataset construction methods were conducted using the VGG16 and VGG19 CNN models. The classification results of dune morphologies were comprehensively analyzed using different evaluation metrics. The experimental results indicate that when the regular segmentation scale of UAV orthoimages is 1024 × 1024 pixels with an overlap of 100 pixels, the classification accuracy, precision, recall, and F1-Score of the VGG16 model reached 97.05%, 96.91%, 96.76%, and 96.82%, respectively. The method for constructing a dune morphology dataset from automatically segmented UAV orthoimages provides a reference value for the study of large-scale dune morphology classification.
Published: 2024
Full Text: View/download PDF

41. RSWFormer: A Multi-Scale Fusion Network from Local to Global with Multiple Stages for Regional Geological Mapping

Author: Sipeng Han, Zhipeng Wan, Junfeng Deng, Congyuan Zhang, Xingwu Liu, Tong Zhu, and Junli Zhao
Subjects: geological environment remote sensing, image classification, deep learning, attention mechanism, Science
Abstract: Geological mapping involves the identification of elements such as rocks, soils, and surface water, which are fundamental tasks in Geological Environment Remote Sensing (GERS) interpretation. High-precision intelligent interpretation technology can not only reduce labor requirements and significantly improve the efficiency of geological mapping but also assist geological disaster prevention assessment and resource exploration. However, the high interclass similarity, high intraclass variability, gradational boundaries, and complex distributional characteristics of GERS elements coupled with the difficulty of manual labeling and the interference of imaging noise, all limit the accuracy of DL-based methods in wide-area GERS interpretation. We propose a Transformer-based multi-stage and multi-scale fusion network, RSWFormer (Rock–Soil–Water Network with Transformer), for geological mapping of spatially large areas. RSWFormer first uses a Multi-stage Geosemantic Hierarchical Sampling (MGHS) module to extract geological information and high-dimensional features at different scales from local to global, and then uses a Multi-scale Geological Context Enhancement (MGCE) module to fuse geological semantic information at different scales to enhance the understanding of contextual semantics. The cascade of the two modules is designed to enhance the interpretation and performance of GERS elements in geologically complex areas. The high mountainous and hilly areas located in western China were selected as the research area. A multi-source geological remote sensing dataset containing diverse GERS feature categories and complex lithological characteristics, Multi-GL9, is constructed to fill the significant gaps in the datasets required for extensive GERS. Using overall accuracy as the evaluation index, RSWFormer achieves 92.15% and 80.23% on the Gaofen-2 and Landsat-8 datasets, respectively, surpassing existing methods. Experiments show that RSWFormer has excellent performance and wide applicability in geological mapping tasks.
Published: 2024
Full Text: View/download PDF

42. A Combination of Remote Sensing Datasets for Coastal Marine Habitat Mapping Using Random Forest Algorithm in Pistolet Bay, Canada

Author: Sahel Mahdavi, Meisam Amani, Saeid Parsian, Candace MacDonald, Michael Teasdale, Justin So, Fan Zhang, and Mardi Gullage
Subjects: aquatic vegetation, LiDAR, drone, ROV, image classification, Science
Abstract: Marine ecosystems serve as vital indicators of biodiversity, providing habitats for diverse flora and fauna. Canada’s extensive coastal regions encompass a rich range of marine habitats, necessitating accurate mapping techniques utilizing advanced technologies, such as remote sensing (RS). This study focused on a study area in Pistolet Bay in Newfoundland and Labrador (NL), Canada, with an area of approximately 170 km2 and depths varying between 0 and −28 m. Considering the relatively large coverage and shallow depths of water of the study area, it was decided to use airborne bathymetric Light Detection and Ranging (LiDAR) data, which used green laser pulses, to map the marine habitats in this region. Along with this LiDAR data, Remotely Operated Vehicle (ROV) footage, high-resolution multispectral drone imagery, true color Google Earth (GE) imagery, and shoreline survey data were also collected. These datasets were preprocessed and categorized into five classes of Eelgrass, Rockweed, Kelp, Other vegetation, and Non-Vegetation. A marine habitat map of the study area was generated using the features extracted from LiDAR data, such as intensity, depth, slope, and canopy height, using an object-based Random Forest (RF) algorithm. Despite multiple challenges, the resulting habitat map exhibited a commendable classification accuracy of 89%. This underscores the efficacy of the developed Artificial Intelligence (AI) model for future marine habitat mapping endeavors across the country.
Published: 2024
Full Text: View/download PDF

43. A CNN- and Self-Attention-Based Maize Growth Stage Recognition Method and Platform from UAV Orthophoto Images

Author: Xindong Ni, Faming Wang, Hao Huang, Ling Wang, Changkai Wen, and Du Chen
Subjects: maize, deep learning, image classification, CNN, self-attention mechanism, Science
Abstract: The accurate recognition of maize growth stages is crucial for effective farmland management strategies. In order to overcome the difficulty of quickly obtaining precise information about maize growth stage in complex farmland scenarios, this study proposes a Maize Hybrid Vision Transformer (MaizeHT) that combines a convolutional algorithmic structure with self-attention for maize growth stage recognition. The MaizeHT model utilizes a ResNet34 convolutional neural network to extract image features to self-attention, which are then transformed into sequence vectors (tokens) using Patch Embedding. It simultaneously inserts category information and location information as a token. A Transformer architecture with multi-head self-attention is employed to extract token features and predict maize growth stage categories using a linear layer. In addition, the MaizeHT model is standardized and encapsulated, and a prototype platform for intelligent maize growth stage recognition is developed for deployment on a website. Finally, the performance validation test of MaizeHT was carried out. To be specific, MaizeHT has an accuracy of 97.71% when the input image resolution is 224 × 224 and 98.71% when the input image resolution is 512 × 512 on the self-built dataset, the number of parameters is 15.446 M, and the floating-point operations are 4.148 G. The proposed maize growth stage recognition method could provide computational support for maize farm intelligence.
Published: 2024
Full Text: View/download PDF

44. Transfer Learning in Multimodal Sunflower Drought Stress Detection

Author: Olivera Lazić, Sandra Cvejić, Boško Dedić, Aleksandar Kupusinac, Siniša Jocić, and Dragana Miladinović
Subjects: artificial intelligence, convolutional neural networks, data augmentation, drought stress detection, image classification, sunflower, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Efficient water supply and timely detection of drought stress in crops to increase yields is an important task considering that agriculture is the primary consumer of water globally. This is particularly significant for plants such as sunflowers, which are an important source of quality edible oils, essential for human nutrition. Traditional detection methods are labor-intensive, time-consuming, and rely on advanced sensor technologies. We introduce an innovative approach based on neural networks and transfer learning for drought stress detection using a novel dataset including 209 non-invasive rhizotron images and 385 images of manually cleaned sections of sunflowers, subjected to normal watering or water stress. We used five neural network models: VGG16, VGG19, InceptionV3, DenseNet, and MobileNet, pre-trained on the ImageNet dataset, whose performance was compared to select the most efficient architecture. Accordingly, the most efficient model, MobileNet, was further refined using different data augmentation mechanisms. The introduction of targeted data augmentation and the use of grayscale images proved to be effective, demonstrating improved results, with an F1 score and an accuracy of 0.95. This approach encourages advances in water stress detection, highlighting the value of artificial intelligence in improving crop health monitoring and management for more resilient agricultural practices.
Published: 2024
Full Text: View/download PDF

45. Multi-Objective Evolutionary Neural Architecture Search with Weight-Sharing Supernet

Author: Junchao Liang, Ke Zhu, Yuan Li, Yun Li, and Yuejiao Gong
Subjects: deep learning, neural architecture search, multi-objective evolutionary algorithms, image classification, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Deep neural networks have played a crucial role in the field of deep learning, achieving significant success in practical applications. The architecture of neural networks is key to their performance. In the past few years, these architectures have been manually designed by experts with rich domain knowledge. Additionally, the optimal neural network architecture can vary depending on specific tasks and data distributions. Neural Architecture Search (NAS) is a class of techniques aimed at automatically searching for and designing neural network architectures according to the given tasks and data. Specifically, evolutionary-computation-based NAS methods are known for their strong global search capability and have aroused widespread interest in recent years. Although evolutionary-computation-based NAS has achieved success in a wide range of research and applications, it still faces bottlenecks in training and evaluating a large number of individuals during optimization. In this study, we first devise a multi-objective evolutionary NAS framework based on a weight-sharing supernet to improve the search efficiency of traditional evolutionary-computation-based NAS. This framework combines the population optimization characteristic of evolutionary algorithms with the weight-sharing ideas in one-shot models. We then design a bi-population MOEA/D algorithm based on the proposed framework to effectively solve the NAS problem. By constructing two sub-populations with different optimization objectives, the algorithm can effectively explore network architectures of various sizes in complex search spaces. An inter-population communication mechanism further enhances the algorithm’s exploratory capability, enabling it to find network architectures with uniform distribution and high diversity. Finally, we conduct performance comparison experiments on image classification datasets of different scales and complexities. Experimental results demonstrate the effectiveness of the proposed multi-objective evolutionary NAS framework and the practicality and transferability of the introduced bi-population MOEA/D-based NAS method compared to existing state-of-the-art NAS methods.
Published: 2024
Full Text: View/download PDF

46. Innovative Approaches to Clinical Diagnosis: Transfer Learning in Facial Image Classification for Celiac Disease Identification

Author: Elif Keskin Bilgiç, İnci Zaim Gökbay, and Yusuf Kayar
Subjects: celiac disease diagnosis, image classification, deep learning, transfer learning, VGG16 pre-trained model, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: Background: Celiac disease arises from gluten consumption and shares symptoms with other conditions, leading to delayed diagnoses. Untreated celiac disease heightens the risk of autoimmune disorders, neurological issues, and certain cancers like lymphoma while also impacting skin health due to intestinal disruptions. This study uses facial photos to distinguish individuals with celiac disease from those without. Surprisingly, there is a lack of research involving transfer learning for this purpose despite its benefits such as faster training, enhanced performance, and reduced overfitting. While numerous studies exist on endoscopic intestinal photo classification and a few have explored the link between facial morphology measurements and celiac disease, none have concentrated on diagnosing celiac disease through facial photo classification. Methods: This study sought to apply transfer learning techniques with VGG16 to address a gap in research by identifying distinct facial features that differentiate patients with celiac disease from healthy individuals. A dataset containing a total of 200 facial images of adult individuals with and without celiac condition was utilized. Half of the dataset had a ratio of 70% females to 30% males with celiac condition, and the rest had a ratio of 60% females to 40% males without celiac condition. Among those with celiac condition, 28 were newly diagnosed and 72 had been previously diagnosed, with 25 not adhering to a gluten-free diet and 47 partially adhering to such a diet. Results: Utilizing transfer learning, the model achieved a 73% accuracy in classifying the facial images of the patients during testing, with corresponding precision, recall, and F1 score values of 0.54, 0.56, and 0.52, respectively. The training process involved 50,178 parameters, showcasing the model’s efficacy in diagnostic image analysis. Conclusions: The model correctly classified approximately three-quarters of the test images. While this is a reasonable level of accuracy, it also suggests that there is room for improvement as the dataset contains images that are inherently difficult to classify even for humans. Increasing the proportion of newly diagnosed patients in the dataset and expanding the dataset size could notably improve the model’s efficacy. Despite being the first study in this field, further refinement holds promise for the development of a diagnostic tool for celiac disease using transfer learning in medical image analysis, addressing the lack of prior studies in this area.
Published: 2024
Full Text: View/download PDF

47. Multimodal Quanvolutional and Convolutional Neural Networks for Multi-Class Image Classification

Author: Yuri Gordienko, Yevhenii Trochun, and Sergii Stirenko
Subjects: artificial intelligence, neural network, quanvolutional neural networks, convolutional neural networks, image classification, multi-class image classification, Technology
Abstract: By utilizing hybrid quantum–classical neural networks (HNNs), this research aims to enhance the efficiency of image classification tasks. HNNs allow us to utilize quantum computing to solve machine learning problems, which can be highly power-efficient and provide significant computation speedup compared to classical operations. This is particularly relevant in sustainable applications where reducing computational resources and energy consumption is crucial. This study explores the feasibility of a novel architecture by leveraging quantum devices as the first layer of the neural network, which proved to be useful for scaling HNNs’ training process. Understanding the role of quanvolutional operations and how they interact with classical neural networks can lead to optimized model architectures that are more efficient and effective for image classification tasks. This research investigates the performance of HNNs across different datasets, including CIFAR100 and Satellite Images of Hurricane Damage by evaluating the performance of HNNs on these datasets in comparison with the performance of reference classical models. By evaluating the scalability of HNNs on diverse datasets, the study provides insights into their applicability across various real-world scenarios, which is essential for building sustainable machine learning solutions that can adapt to different environments. Leveraging transfer learning techniques with pre-trained models such as ResNet, EfficientNet, and VGG16 demonstrates the potential for HNNs to benefit from existing knowledge in classical neural networks. This approach can significantly reduce the computational cost of training HNNs from scratch while still achieving competitive performance. The feasibility study conducted in this research assesses the practicality and viability of deploying HNNs for real-world image classification tasks. By comparing the performance of HNNs with classical reference models like ResNet, EfficientNet, and VGG-16, this study provides evidence of the potential advantages of HNNs in certain scenarios. Overall, the findings of this research contribute to advancing sustainable applications of machine learning by proposing novel techniques, optimizing model architectures, and demonstrating the feasibility of adopting HNNs for real-world image classification problems. These insights can inform the development of more efficient and environmentally friendly machine learning solutions.
Published: 2024
Full Text: View/download PDF

48. LDD: High-Precision Training of Deep Spiking Neural Network Transformers Guided by an Artificial Neural Network

Author: Yuqian Liu, Chujie Zhao, Yizhou Jiang, Ying Fang, and Feng Chen
Subjects: spiking neural networks (SNNs), Transformer, distillation, image classification, Technology
Abstract: The rise of large-scale Transformers has led to challenges regarding computational costs and energy consumption. In this context, spiking neural networks (SNNs) offer potential solutions due to their energy efficiency and processing speed. However, the inaccuracy of surrogate gradients and feature space quantization pose challenges for directly training deep SNN Transformers. To tackle these challenges, we propose a method (called LDD) to align ANN and SNN features across different abstraction levels in a Transformer network. LDD incorporates structured feature knowledge from ANNs to guide SNN training, ensuring the preservation of crucial information and addressing inaccuracies in surrogate gradients through designing layer-wise distillation losses. The proposed approach outperforms existing methods on the CIFAR10 (96.1%), CIFAR100 (82.3%), and ImageNet (80.9%) datasets, and enables training of the deepest SNN Transformer network using ImageNet.
Published: 2024
Full Text: View/download PDF

49. Optimizing Convolutional Neural Networks for Image Classification on Resource-Constrained Microcontroller Units

Author: Susanne Brockmann and Tim Schlippe
Subjects: TinyML, image classification, microcontroller units, Electronic computers. Computer science, QA75.5-76.95
Abstract: Running machine learning algorithms for image classification locally on small, cheap, and low-power microcontroller units (MCUs) has advantages in terms of bandwidth, inference time, energy, reliability, and privacy for different applications. Therefore, TinyML focuses on deploying neural networks on MCUs with random access memory sizes between 2 KB and 512 KB and read-only memory storage capacities between 32 KB and 2 MB. Models designed for high-end devices are usually ported to MCUs using model scaling factors provided by the model architecture’s designers. However, our analysis shows that this naive approach of substantially scaling down convolutional neural networks (CNNs) for image classification using such default scaling factors results in suboptimal performance. Consequently, in this paper we present a systematic strategy for efficiently scaling down CNN model architectures to run on MCUs. Moreover, we present our CNN Analyzer, a dashboard-based tool for determining optimal CNN model architecture scaling factors for the downscaling strategy by gaining layer-wise insights into the model architecture scaling factors that drive model size, peak memory, and inference time. Using our strategy, we were able to introduce additional new model architecture scaling factors for MobileNet v1, MobileNet v2, MobileNet v3, and ShuffleNet v2 and to optimize these model architectures. Our best model variation outperforms the MobileNet v1 version provided in the MLPerf Tiny Benchmark on the Visual Wake Words image classification task, reducing the model size by 20.5% while increasing the accuracy by 4.0%.
Published: 2024
Full Text: View/download PDF

50. MV-MFF: Multi-View Multi-Feature Fusion Model for Pneumonia Classification

Author: Najla Alsulami, Hassan Althobaiti, and Tarik Alafif
Subjects: pneumonia, multi-view, variational autoencoder, chest X-ray, CheXpert, image classification, Medicine (General), R5-920
Abstract: Pneumonia ranks among the most prevalent lung diseases and poses a significant concern since it is one of the diseases that may lead to death around the world. Diagnosing pneumonia necessitates a chest X-ray and substantial expertise to ensure accurate assessments. Despite the critical role of lateral X-rays in providing additional diagnostic information alongside frontal X-rays, they have not been widely used. Obtaining X-rays from multiple perspectives is crucial, significantly improving the precision of disease diagnosis. In this paper, we propose a multi-view multi-feature fusion model (MV-MFF) that integrates latent representations from a variational autoencoder and a β-variational autoencoder. Our model aims to classify pneumonia presence using multi-view X-rays. Experimental results demonstrate that the MV-MFF model achieves an accuracy of 80.4% and an area under the curve of 0.775, outperforming current state-of-the-art methods. These findings underscore the efficacy of our approach in improving pneumonia diagnosis through multi-view X-ray analysis.
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

1,020 results on '"image classification"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources