1,104 results on '"swin transformer"'
Search Results
2. PolySegNet: improving polyp segmentation through swin transformer and vision transformer fusion.
- Author
-
Lijin, P., Ullah, Mohib, Vats, Anuja, Cheikh, Faouzi Alaya, Santhosh Kumar, G., and Nair, Madhu S.
- Abstract
Colorectal cancer ranks as the second most prevalent cancer worldwide, with a high mortality rate. Colonoscopy stands as the preferred procedure for diagnosing colorectal cancer. Detecting polyps at an early stage is critical for effective prevention and diagnosis. However, challenges in colonoscopic procedures often lead medical practitioners to seek support from alternative techniques for timely polyp identification. Polyp segmentation emerges as a promising approach to identify polyps in colonoscopy images. In this paper, we propose an advanced method, PolySegNet, that leverages both Vision Transformer and Swin Transformer, coupled with a Convolutional Neural Network (CNN) decoder. The fusion of these models facilitates a comprehensive analysis of various modules in our proposed architecture.To assess the performance of PolySegNet, we evaluate it on three colonoscopy datasets, a combined dataset, and their augmented versions. The experimental results demonstrate that PolySegNet achieves competitive results in terms of polyp segmentation accuracy and efficacy, achieving a mean Dice score of 0.92 and a mean Intersection over Union (IoU) of 0.86. These metrics highlight the superior performance of PolySegNet in accurately delineating polyp boundaries compared to existing methods. PolySegNet has shown great promise in accurately and efficiently segmenting polyps in medical images. The proposed method could be the foundation for a new class of transformer-based segmentation models in medical image analysis. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
3. Integrating Swin Transformer with Fuzzy Gray Wolve Optimization for MRI Brain Tumor Classification.
- Author
-
Katran, L. Fahem, AlShemmary, E. N., and Al-Jawher, W. A. M.
- Abstract
The diagnosis is influenced by the classification of brain MRIs. Classifying and analyzing structures within images can be significantly enhanced by employing the Swin Transformer. The Swin Transformer is capable of capturing long-range relationships between pixels and creating layered image representations, which enhances its capacity to accurately evaluate brain structures. In addition, it has been demonstrated to be highly efficient in terms of computational power and memory usage, rendering it an optimal choice for high-resolution image processes. It is a tool that is adaptable in the field of computer vision, as it can be used for tasks such as object detection and image segmentation, demonstrating its versatility. The objective of this research is to identify tumors by classifying brain MRI image. In the event that tumors are identified, the research classifies them into three types: glioma, pituitary, and meningioma. The model utilizes 30,405 MRI images collected from two sources: Kaggle and the Al-Razi Medical Center in Iraq. The Swin Transformer's performance is enhanced by the integration of the feature selection mechanism with Gray Wolf Optimizer (GWO). The reduction in the number of features used during training, as a result of feature selection facilitated by GWO, leads to improved model efficiency and reduced data processing requirements. This simplified approach accelerates model performance by improving efficiency and memory usage. Additionally, optimal feature selection eliminates features that improve the models' accuracy and their capacity to differentiate between classes in MRI images. Incorporating the Fuzzy C Means (FCM) technique into feature selection may additionally improve the performance of GWO. The FCM assists in the grouping of related features to facilitate the selection of features by GWO. This method enhances the discrimination of non-useful features, resulting in reduced feature conflicts and improved model accuracy. Additionally, FCM assists in the identification of cluster centers that direct the GWO during optimization, thereby reducing the necessity for feature evaluation and improving processing efficiency. The optimization process is also expedited by integration. Improves the capacity of models to generalize in order to reduce the risk of overfitting and guarantee stability across datasets. This method enhances diagnostic accuracy by collecting features and analyzing structures in brain images, thereby providing a more profound understanding of the data. The proposed model achieved an accuracy of 99.490% with a loss rate of 0.0222 when trained on fuzzy optimization features. For training on fuzzy features derived from fuzzy wavelet MRI images, the model reached an accuracy of 99.572% and maintained a loss rate of 0.0206. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
4. Crowd behavior detection: leveraging video swin transformer for crowd size and violence level analysis.
- Author
-
Qaraqe, Marwa, Yang, Yin David, Varghese, Elizabeth B, Basaran, Emrah, and Elzein, Almiqdad
- Subjects
COLLECTIVE behavior ,TRANSFORMER models ,SOFTWARE development tools ,STREAMING video & television ,CLOSED-circuit television - Abstract
In recent years, crowd behavior detection has posed significant challenges in the realm of public safety and security, even with the advancements in surveillance technologies. The ability to perform real-time surveillance and accurately identify crowd behavior by considering factors such as crowd size and violence levels can avert potential crowd-related disasters and hazards to a considerable extent. However, most existing approaches are not viable to deal with the complexities of crowd dynamics and fail to distinguish different violence levels within crowds. Moreover, the prevailing approach to crowd behavior recognition, which solely relies on the analysis of closed-circuit television (CCTV) footage and overlooks the integration of online social media video content, leads to a primarily reactive methodology. This paper proposes a crowd behavior detection framework based on the swin transformer architecture, which leverages crowd counting maps and optical flow maps to detect crowd behavior across various sizes and violence levels. To support this framework, we created a dataset comprising videos capable of recognizing crowd behaviors based on size and violence levels sourced from CCTV camera footage and online videos. Experimental analysis conducted on benchmark datasets and our proposed dataset substantiates the superiority of our proposed approach over existing state-of-the-art methods, showcasing its ability to effectively distinguish crowd behaviors concerning size and violence level. Our method's validation through Nvidia's DeepStream Software Development Kit (SDK) highlights its competitive performance and potential for real-time intelligent surveillance applications. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
5. Sports-ACtrans Net: research on multimodal robotic sports action recognition driven via ST-GCN.
- Author
-
Qi Lu
- Abstract
Introduction: Accurately recognizing and understanding human motion actions presents a key challenge in the development of intelligent sports robots. Traditional methods often encounter significant drawbacks, such as high computational resource requirements and suboptimal real-time performance. To address these limitations, this study proposes a novel approach called Sports-ACtrans Net. Methods: In this approach, the Swin Transformer processes visual data to extract spatial features, while the Spatio-Temporal Graph Convolutional Network (ST-GCN) models human motion as graphs to handle skeleton data. By combining these outputs, a comprehensive representation of motion actions is created. Reinforcement learning is employed to optimize the action recognition process, framing it as a sequential decision-making problem. Deep Q-learning is utilized to learn the optimal policy, thereby enhancing the robot's ability to accurately recognize and engage in motion. Results and discussion: Experiments demonstrate significant improvements over state-of-the-art methods. This research advances the fields of neural computation, computer vision, and neuroscience, aiding in the development of intelligent robotic systems capable of understanding and participating in sports activities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
6. Multi-label dental disorder diagnosis based on MobileNetV2 and swin transformer using bagging ensemble classifier.
- Author
-
Alsakar, Yasmin M., Elazab, Naira, Nader, Nermeen, Mohamed, Waleed, Ezzat, Mohamed, and Elmogy, Mohammed
- Abstract
Dental disorders are common worldwide, causing pain or infections and limiting mouth opening, so dental conditions impact productivity, work capability, and quality of life. Manual detection and classification of oral diseases is time-consuming and requires dentists' evaluation and examination. The dental disease detection and classification system based on machine learning and deep learning will aid in early dental disease diagnosis. Hence, this paper proposes a new diagnosis system for dental diseases using X-ray imaging. The framework includes a robust pre-processing phase that uses image normalization and adaptive histogram equalization to improve image quality and reduce variation. A dual-stream approach is used for feature extraction, utilizing the advantages of Swin Transformer for capturing long-range dependencies and global context and MobileNetV2 for effective local feature extraction. A thorough representation of dental anomalies is produced by fusing the extracted features. To obtain reliable and broadly applicable classification results, a bagging ensemble classifier is utilized in the end. We evaluate our model on a benchmark dental radiography dataset. The experimental results and comparisons show the superiority of the proposed system with 95.7% for precision, 95.4% for sensitivity, 95.7% for specificity, 95.5% for Dice similarity coefficient, and 95.6% for accuracy. The results demonstrate the effectiveness of our hybrid model integrating MoileNetv2 and Swin Transformer architectures, outperforming state-of-the-art techniques in classifying dental diseases using dental panoramic X-ray imaging. This framework presents a promising method for robustly and accurately diagnosing dental diseases automatically, which may help dentists plan treatments and identify dental diseases early on. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
7. Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery.
- Author
-
Choi, Jaewan, Seo, Doochun, Jung, Jinha, Han, Youkyung, Oh, Jaehong, and Lee, Changno
- Abstract
It is necessary to extract and recognize the cloud regions presented in imagery to generate satellite imagery as analysis-ready data (ARD). In this manuscript, we proposed a new deep learning model to detect cloud areas in very-high-resolution (VHR) satellite imagery by fusing two deep learning architectures. The proposed UNet3+ model with a hybrid Swin Transformer and EfficientNet (UNet3+STE) was based on the structure of UNet3+, with the encoder sequentially combining EfficientNet based on mobile inverted bottleneck convolution (MBConv) and the Swin Transformer. By sequentially utilizing convolutional neural networks (CNNs) and transformer layers, the proposed algorithm aimed to extract the local and global information of cloud regions effectively. In addition, the decoder used MBConv to restore the spatial information of the feature map extracted by the encoder and adopted the deep supervision strategy of UNet3+ to enhance the model's performance. The proposed model was trained using the open dataset derived from KOMPSAT-3 and 3A satellite imagery and conducted a comparative evaluation with the state-of-the-art (SOTA) methods on fourteen test datasets at the product level. The experimental results confirmed that the proposed UNet3+STE model outperformed the SOTA methods and demonstrated the most stable precision, recall, and F1 score values with fewer parameters and lower complexity. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
8. Removing random noise and improving the resolution of seismic data using deep‐learning transformers.
- Author
-
Sun, Qifeng, Feng, Yali, Du, Qizhen, and Gong, Faming
- Subjects
- *
TRANSFORMER models , *FEATURE extraction , *DATA extraction , *DEEP learning , *NOISE - Abstract
Post‐stack data are susceptible to noise interference and have low resolution, which impacts the accuracy and efficiency of subsequent seismic data interpretation. To address this issue, we propose a deep learning approach called Seis‐SUnet, which achieves simultaneous random noise suppression and super‐resolution reconstruction of seismic data. First, the Conv‐Swin‐Block is designed to utilize ordinary convolution and Swin transformer to capture the long‐distance dependencies in the spatial location of seismic data, enabling the network to comprehensively comprehend the overall structure of seismic data. Second, to address the problem of weakening the effective signal during network mapping, we use a hybrid training strategy of L1 loss, edge loss and multi‐scale structural similarity loss. The edge loss function directs the network training to focus more on the high‐frequency information at the edges of seismic data by amplifying the weight. Additionally, the verification of synthetic and field seismic datasets confirms that Seis‐SUnet can effectively improve the signal‐to‐noise ratio and resolution of seismic data. By comparing it with traditional methods and two deep learning reconstruction methods, experimental results demonstrate that Seis‐SUnet excels in removing random noise, preserving the continuity of rock layers and maintaining faults as well as being strong robustness. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
9. RSTSRN: Recursive Swin Transformer Super-Resolution Network for Mars Images.
- Author
-
Wu, Fanlu, Jiang, Xiaonan, Fu, Tianjiao, Fu, Yao, Xu, Dongdong, and Zhao, Chunlei
- Abstract
High-resolution optical images will provide planetary geology researchers with finer and more microscopic image data information. In order to maximize scientific output, it is necessary to further increase the resolution of acquired images, so image super-resolution (SR) reconstruction techniques have become the best choice. Aiming at the problems of large parameter quantity and high computational complexity in current deep learning-based image SR reconstruction methods, we propose a novel Recursive Swin Transformer Super-Resolution Network (RSTSRN) for SR applied to images. The RSTSRN improves upon the LapSRN, which we use as our backbone architecture. A Residual Swin Transformer Block (RSTB) is used for more efficient residual learning, which consists of stacked Swin Transformer Blocks (STBs) with a residual connection. Moreover, the idea of parameter sharing was introduced to reduce the number of parameters, and a multi-scale training strategy was designed to accelerate convergence speed. Experimental results show that the proposed RSTSRN achieves superior performance on 2×, 4× and 8×SR tasks to state-of-the-art methods with similar parameters. Especially on high-magnification SR tasks, the RSTSRN has great performance superiority. Compared to the LapSRN network, for 2×, 4× and 8× Mars image SR tasks, the RSTSRN network has increased PSNR values by 0.35 dB, 0.88 dB and 1.22 dB, and SSIM values by 0.0048, 0.0114 and 0.0311, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
10. An improved YOLOv7 model based on Swin Transformer and Trident Pyramid Networks for accurate tomato detection.
- Author
-
Guoxu Liu, Yonghui Zhang, Jun Liu, Deyong Liu, Chunlei Chen, Yujie Li, Xiujie Zhang, and Touko Mbouembe, Philippe Lyonel
- Subjects
TRANSFORMER models ,FRUIT harvesting ,FRUIT ,PYRAMIDS ,COMMERCIALIZATION - Abstract
Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F1 score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
11. Anomaly Detection in Embryo Development and Morphology Using Medical Computer Vision-Aided Swin Transformer with Boosted Dipper-Throated Optimization Algorithm.
- Author
-
Mazroa, Alanoud Al, Maashi, Mashael, Said, Yahia, Maray, Mohammed, Alzahrani, Ahmad A., Alkharashi, Abdulwhab, and Al-Sharafi, Ali M.
- Abstract
Infertility affects a significant number of humans. A supported reproduction technology was verified to ease infertility problems. In vitro fertilization (IVF) is one of the best choices, and its success relies on the preference for a higher-quality embryo for transmission. These have been normally completed physically by testing embryos in a microscope. The traditional morphological calculation of embryos shows predictable disadvantages, including effort- and time-consuming and expected risks of bias related to individual estimations completed by specific embryologists. Different computer vision (CV) and artificial intelligence (AI) techniques and devices have been recently applied in fertility hospitals to improve efficacy. AI addresses the imitation of intellectual performance and the capability of technologies to simulate cognitive learning, thinking, and problem-solving typically related to humans. Deep learning (DL) and machine learning (ML) are advanced AI algorithms in various fields and are considered the main algorithms for future human assistant technology. This study presents an Embryo Development and Morphology Using a Computer Vision-Aided Swin Transformer with a Boosted Dipper-Throated Optimization (EDMCV-STBDTO) technique. The EDMCV-STBDTO technique aims to accurately and efficiently detect embryo development, which is critical for improving fertility treatments and advancing developmental biology using medical CV techniques. Primarily, the EDMCV-STBDTO method performs image preprocessing using a bilateral filter (BF) model to remove the noise. Next, the swin transformer method is implemented for the feature extraction technique. The EDMCV-STBDTO model employs the variational autoencoder (VAE) method to classify human embryo development. Finally, the hyperparameter selection of the VAE method is implemented using the boosted dipper-throated optimization (BDTO) technique. The efficiency of the EDMCV-STBDTO method is validated by comprehensive studies using a benchmark dataset. The experimental result shows that the EDMCV-STBDTO method performs better than the recent techniques. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
12. Identification of Anomalies in Lung and Colon Cancer Using Computer Vision-Based Swin Transformer with Ensemble Model on Histopathological Images.
- Author
-
Alsulami, Abdulkream A., Albarakati, Aishah, AL-Ghamdi, Abdullah AL-Malaise, and Ragab, Mahmoud
- Abstract
Lung and colon cancer (LCC) is a dominant life-threatening disease that needs timely attention and precise diagnosis for efficient treatment. The conventional diagnostic techniques for LCC regularly encounter constraints in terms of efficiency and accuracy, thus causing challenges in primary recognition and treatment. Early diagnosis of the disease can immensely reduce the probability of death. In medical practice, the histopathological study of the tissue samples generally uses a classical model. Still, the automated devices that exploit artificial intelligence (AI) techniques produce efficient results in disease diagnosis. In histopathology, both machine learning (ML) and deep learning (DL) approaches can be deployed owing to their latent ability in analyzing and predicting physically accurate molecular phenotypes and microsatellite uncertainty. In this background, this study presents a novel technique called Lung and Colon Cancer using a Swin Transformer with an Ensemble Model on the Histopathological Images (LCCST-EMHI). The proposed LCCST-EMHI method focuses on designing a DL model for the diagnosis and classification of the LCC using histopathological images (HI). In order to achieve this, the LCCST-EMHI model utilizes the bilateral filtering (BF) technique to get rid of the noise. Further, the Swin Transformer (ST) model is also employed for the purpose of feature extraction. For the LCC detection and classification process, an ensemble deep learning classifier is used with three techniques: bidirectional long short-term memory with multi-head attention (BiLSTM-MHA), Double Deep Q-Network (DDQN), and sparse stacked autoencoder (SSAE). Eventually, the hyperparameter selection of the three DL models can be implemented utilizing the walrus optimization algorithm (WaOA) method. In order to illustrate the promising performance of the LCCST-EMHI approach, an extensive range of simulation analyses was conducted on a benchmark dataset. The experimentation results demonstrated the promising performance of the LCCST-EMHI approach over other recent methods. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
13. MS-UNet: Multi-Scale Nested UNet for Medical Image Segmentation with Few Training Data Based on an ELoss and Adaptive Denoising Method †.
- Author
-
Chen, Haoyuan, Han, Yufei, Yao, Linwei, Wu, Xin, Li, Kuan, and Yin, Jianping
- Subjects
- *
TRANSFORMER models , *IMAGE processing , *DIAGNOSTIC imaging , *NETWORK performance - Abstract
Traditional U-shape segmentation models can achieve excellent performance with an elegant structure. However, the single-layer decoder structure of U-Net or SwinUnet is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse in the field of medical image processing, where annotated data are more difficult to obtain than other tasks. Based on this observation, we propose a U-like model named MS-UNet with a plug-and-play adaptive denoising module and ELoss for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUNet, we specifically designed a multi-scale nested decoder based on the Swin Transformer for U-Net. The proposed multi-scale nested decoder structure allows for the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. In addition, ELoss could improve the attention of the model to the segmentation edges, and the plug-and-play adaptive denoising module could prevent the model from learning the wrong features without losing detailed information. The experimental results show that MS-UNet could effectively improve network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. Furthermore, the proposed ELoss and denoising module not only significantly enhance the segmentation performance of MS-UNet but can also be applied individually to other models. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
14. Hybrid Swin-CSRNet: A Novel and Efficient Fish Counting Network in Aquaculture.
- Author
-
Liu, Jintao, Tolón-Becerra, Alfredo, Bienvenido-Barcena, José Fernando, Yang, Xinting, Zhu, Kaijie, and Zhou, Chao
- Abstract
Real-time estimation of fish biomass plays a crucial role in real-world fishery production, as it helps formulate feeding strategies and other management decisions. In this paper, a dense fish counting network called Swin-CSRNet is proposed. Specifically, the VGG16 layer in the front-end is replaced with the Swin transformer to extract image features more efficiently. Additionally, a squeeze-and-excitation (SE) module is introduced to enhance feature representation by dynamically adjusting the importance of each channel through "squeeze" and "excitation", making the extracted features more focused and effective. Finally, a multi-scale fusion (MSF) module is added after the back-end to fully utilize the multi-scale feature information, enhancing the model's ability to capture multi-scale details. The experiment demonstrates that Swin-CSRNet achieved excellent results with MAE, RMSE, and MAPE and a correlation coefficient R
2 of 11.22, 15.32, 5.18%, and 0.954, respectively. Meanwhile, compared to the original network, the parameter size and computational complexity of Swin-CSRNet were reduced by 70.17% and 79.05%, respectively. Therefore, the proposed method not only counts the number of fish with higher speed and accuracy but also contributes to advancing the automation of aquaculture. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
15. BS-YOLOV8: an intelligent detection model for bearing pin support-piece states of high-rise building machine.
- Author
-
Pan, Xi, Zhao, Tingsheng, Li, Xuxiang, and Jiang, Xiaohui
- Subjects
TRANSFORMER models ,SKYSCRAPERS ,LEAK detection ,INTELLIGENT buildings ,ERROR rates ,INTRUSION detection systems (Computer security) ,PROBLEM solving ,IMAGE encryption - Abstract
As the main support part of the working platform of a high-rise building machine, the bearing pin support (BPS) plays a crucial role in the safety and stability of the platform, the conventional method has the problems of low detection efficiency, low accuracy, and high cost. To improve the accuracy and robustness of the detection algorithm under weak light, this paper proposes an intelligent detection algorithm for the BPS-piece states of the BS-YOLOV8, to improve the feature map utilization and reduce the model leakage detection error detection rate, Swin transformer is used to improve the YOLOV8 backbone network. In addition, the BiFormer attention mechanism is used to weigh the feature map to solve the problem of feature information loss in different feature layers and weak lighting conditions, and then the Scylla-IOU loss function is used instead of the original localization loss function to guide the model to learn to generate a predicted bounding box closer to the real target bounding box. Finally, the BS-YOLOV8 algorithm is used to compare with its classical algorithm on the self-constructed dataset of this study, The results show that the mAP0.5, mAP0.5:0.95, and FPS values of the BS-YOLOV8 algorithm reach 97.9%, 96.3% and 40 under normal lighting. The mAP0.5 value reaches 87.6% under low light conditions, which effectively solves the problems of low detection efficiency and poor detection under low light conditions, and is superior compared to other algorithms. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
16. An improved model based on YOLOX for detection of tea sprouts in natural environment.
- Author
-
Li, Xiutong, Liu, Ruixin, Li, Yuxin, Li, Zhilin, Yan, Peng, Yu, Mei, Dong, Xuan, Yan, Jianwei, and Xie, Benliang
- Abstract
The tea industry occupies a pivotal and important position in China's import and export trade commodities. With the improvement of people's quality of life, the demand for famous tea sprout is increasing. However, manual picking is inefficient and costly. Although mechanical picking can pick tea sprouts efficiently, it lacks selectivity, which leads to an increase in the workload of post-processing and screening of superior tea leaves. To address this, this paper establishes a dataset for tea sprouts in natural environments and proposes an improved YOLOX tea sprouts detection model, YOLOX-ST based on the Swin Transformer. The model employs the Swin Transformer as the backbone network to enhance overall detection accuracy. Additionally, it introduces the CBAM attention mechanism to address issues of miss-detection and false detections in complex environments. Furthermore, a small target detection layer is also incorporated to resolve the problem of incomplete information about tea sprout features learned from the deep feature map. To address the sample imbalance, we introduce the EIoU loss function and apply Focal Loss to the confidence level. The experimental results demonstrate that the proposed model in this paper achieves an accuracy of 95.45%, which is 5.73% higher than the original YOLOX model. Moreover, it outperforms other YOLO series models in terms of accuracy, while achieving a faster detection speed, reaching 93.2 FPS. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
17. Towards improved fundus disease detection using Swin Transformers.
- Author
-
Jawad, M Abdul, Khursheed, Farida, Nawaz, Shah, and Mir, A. H.
- Subjects
TRANSFORMER models ,MACULAR degeneration ,COMPUTER-aided diagnosis ,MACHINE learning ,EQUILIBRIUM testing ,DEEP learning - Abstract
Ocular diseases can have debilitating consequences on visual acuity if left untreated, necessitating early and accurate diagnosis to improve patients' quality of life. Although the contemporary clinical prognosis involving fundus screening is a cost-effective method for detecting ocular abnormalities, however, it is time-intensive due to limited resources and expert ophthalmologists. While computer-aided detection, including traditional machine learning and deep learning, has been employed for enhanced prognosis from fundus images, conventional deep learning models often face challenges due to limited global modeling ability, inducing bias and suboptimal performance on unbalanced datasets. Presently, most studies on ocular disease detection focus on cataract detection or diabetic retinopathy severity prediction, leaving a myriad of vision-impairing conditions unexplored. Minimal research has been conducted utilizing deep models for identifying diverse ocular abnormalities from fundus images, with limited success. The study leveraged the capabilities of four Swin Transformer models (Swin-T, Swin-S, Swin-B, and Swin-L) for detecting various significant ocular diseases (including Cataracts, Hypertensive Retinopathy, Diabetic Retinopathy, Myopia, and Age-Related Macular Degeneration) from fundus images of the ODIR dataset. Swin Transformer models, confining self-attention to local windows while enabling cross-window interactions, demonstrated superior performance and computational efficiency. Upon assessment across three specific ODIR test sets, utilizing metrics such as AUC, F1-score, Kappa score, and a composite metric representing an average of these three (referred to as the final score), all Swin models exhibited superior performance metric scores than those documented in contemporary studies. The Swin-L model, in particular, achieved final scores of 0.8501, 0.8211, and 0.8616 on the Off-site, On-site, and Balanced ODIR test sets, respectively. An external validation on a Retina dataset further substantiated the generalizability of Swin models, with the models reporting final scores of 0.9058 (Swin-T), 0.92907 (Swin-S), 0.95917 (Swin-B), and 0.97042 (Swin-L). The results, corroborated by statistical analysis, underline the consistent and stable performance of Swin models across varied datasets, emphasizing their potential as reliable tools for multi-ocular disease detection from fundus images, thereby aiding in the early diagnosis and intervention of ocular abnormalities. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
18. Retinex decomposition based low‐light image enhancement by integrating Swin transformer and U‐Net‐like architecture.
- Author
-
Wang, Zexin, Qingge, Letu, Pan, Qingyi, and Yang, Pei
- Subjects
- *
TRANSFORMER models , *IMAGE intensifiers , *VISUAL perception , *REFLECTANCE , *TEST methods - Abstract
Low‐light images are captured in environments with minimal lighting, such as nighttime or underwater conditions. These images often suffer from issues like low brightness, poor contrast, lack of detail, and overall darkness, significantly impairing human visual perception and subsequent high‐level visual tasks. Enhancing low‐light images holds great practical significance. Among the various existing methods for Low‐Light Image Enhancement (LLIE), those based on the Retinex theory have gained significant attention. However, despite considerable efforts in prior research, the challenge of Retinex decomposition remains unresolved. In this study, an LLIE network based on the Retinex theory is proposed, which addresses these challenges by integrating attention mechanisms and a U‐Net‐like architecture. The proposed model comprises three modules: the Decomposition module (DECM), the Reflectance Recovery module (REFM), and the Illumination Enhancement module (ILEM). Its objective is to decompose low‐light images based on the Retinex theory and enhance the decomposed reflectance and illumination maps using attention mechanisms and a U‐Net‐like architecture. We conducted extensive experiments on several widely used public datasets. The qualitative results demonstrate that the approach produces enhanced images with superior visual quality compared to the existing methods on all test datasets, especially for some extremely dark images. Furthermore, the quantitative evaluation results based on metrics PSNR, SSIM, LPIPS, BRISQUE, and MUSIQ show the proposed model achieves superior performance, with PSNR and BRISQUE significantly outperforming the baseline approaches, where (PSNR, mean BRISQUE) values of the proposed method and the second best results are (17.14, 17.72) and (16.44, 19.65). Additionally, further experimental results such as ablation studies indicate the effectiveness of the proposed model. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
19. Residual swin transformer for classifying the types of cotton pests in complex background.
- Author
-
Ting Zhang, Jikui Zhu, Fengkui Zhang, Shijie Zhao, Wei Liu, Ruohong He, Hongqiang Dong, Qingqing Hong, Changwei Tan, and Ping Li
- Subjects
TRANSFORMER models ,COTTON growing ,COTTON aphid ,DATA augmentation ,DEEP learning ,COTTON - Abstract
Background: Cotton pests have a major impact on cotton quality and yield during cotton production and cultivation. With the rapid development of agricultural intelligence, the accurate classification of cotton pests is a key factor in realizing the precise application of medicines by utilize unmanned aerial vehicles (UAVs), large application devices and other equipment. Methods: In this study, a cotton insect pest classificationmodel based on improved Swin Transformer is proposed. The model introduces the residual module, skip connection, into Swin Transformer to improve the problem that pest features are easily confused in complex backgrounds leading to poor classification accuracy, and to enhance the recognition of cotton pests. In this study, 2705 leaf images of cotton insect pests (including three insect pests, cotton aphids, cotton mirids and cotton leaf mites) were collected in the field, and after image preprocessing and data augmentation operations, model training was performed. Results: The test results proved that the accuracy of the improvedmodel compared to the original model increased from 94.6% to 97.4%, and the prediction time for a single image was 0.00434s. The improved Swin Transformer model was compared with seven kinds of classification models (VGG11, VGG11-bn, Resnet18, MobilenetV2, VIT, Swin Transformer small, and Swin Transformer base), and the model accuracy was increased respectively by 0.5%, 4.7%, 2.2%, 2.5%, 6.3%, 7.9%, 8.0%. Discussion: Therefore, this study demonstrates that the improved Swin Transformer model significantly improves the accuracy and efficiency of cotton pest detection compared with other classification models, and can be deployed on edge devices such as utilize unmanned aerial vehicles (UAVs), thus providing an important technological support and theoretical basis for cotton pest control and precision drug application. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
20. MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification.
- Author
-
Solomon, A. Arun and Agnes, S. Akila
- Subjects
- *
TRANSFORMER models , *REMOTE sensing , *TERRAIN mapping , *COMPUTER performance , *LAND cover - Abstract
Recent advancements in deep learning have significantly improved the performance of remote sensing scene classification, a critical task in remote sensing applications. This study presents a new aerial scene classification model, the Multi-Scale Swin–CNN Aerial Classifier (MSCAC), which employs the Swin Transformer, an advanced architecture that has demonstrated exceptional performance in a range of computer vision applications. The Swin Transformer leverages shifted window mechanisms to efficiently model long-range dependencies and local features in images, making it particularly suitable for the complex and varied textures in aerial imagery. The model is designed to capture intricate spatial hierarchies and diverse scene characteristics at multiple scales. A framework is developed that integrates the Swin Transformer with a multi-scale strategy, enabling the extraction of robust features from aerial images of different resolutions and contexts. This approach allows the model to effectively learn from both global structures and fine-grained details, which is crucial for accurate scene classification. The model's performance is evaluated on several benchmark datasets, including UC-Merced, WHU-RS19, RSSCN7, and AID, where it demonstrates a superior or comparable accuracy to state-of-the-art models. The MSCAC model's adaptability to varying amounts of training data and its ability to improve with increased data make it a promising tool for real-world remote sensing applications. This study highlights the potential of integrating advanced deep-learning architectures like the Swin Transformer into aerial scene classification, paving the way for more sophisticated and accurate remote sensing systems. The findings suggest that the proposed model has significant potential for various remote sensing applications, including land cover mapping, urban planning, and environmental monitoring. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
21. Enhancing Melanoma Diagnosis with Advanced Deep Learning Models Focusing on Vision Transformer, Swin Transformer, and ConvNeXt.
- Author
-
Aksoy, Serra, Demircioglu, Pinar, and Bogrekci, Ismail
- Subjects
- *
TRANSFORMER models , *CONVOLUTIONAL neural networks , *DEEP learning , *COMPUTER-assisted image analysis (Medicine) , *MELANOMA diagnosis , *MELANOMA - Abstract
Skin tumors, especially melanoma, which is highly aggressive and progresses quickly to other sites, are an issue in various parts of the world. Nevertheless, the one and only way to save lives is to detect it at its initial stages. This study explores the application of advanced deep learning models for classifying benign and malignant melanoma using dermoscopic images. The aim of the study is to enhance the accuracy and efficiency of melanoma diagnosis with the ConvNeXt, Vision Transformer (ViT) Base-16, and Swin Transformer V2 Small (Swin V2 S) deep learning models. The ConvNeXt model, which integrates principles of both convolutional neural networks and transformers, demonstrated superior performance, with balanced precision and recall metrics. The dataset, sourced from Kaggle, comprises 13,900 uniformly sized images, preprocessed to standardize the inputs for the models. Experimental results revealed that ConvNeXt achieved the highest diagnostic accuracy among the tested models. Experimental results revealed that ConvNeXt achieved an accuracy of 91.5%, with balanced precision and recall rates of 90.45% and 92.8% for benign cases, and 92.61% and 90.2% for malignant cases, respectively. The F1-scores for ConvNeXt were 91.61% for benign cases and 91.39% for malignant cases. This research points out the potential of hybrid deep learning architectures in medical image analysis, particularly for early melanoma detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
22. Context Aggregation Network for Remote Sensing Image Semantic Segmentation.
- Author
-
Zhang, Changxing, Bai, Xiangyu, Wang, Dapeng, and Zhou, KeXin
- Subjects
- *
TRANSFORMER models , *REMOTE sensing , *IMAGE segmentation , *SWIMMING - Abstract
In recent years, remote sensing technology has been widely applied in various industries, and semantic segmentation of remote sensing images has attracted much attention. Due to the complexity and special characteristics of remote sensing images, multi-scale object detection and accurate object localization are important challenges in remote sensing image semantic segmentation. Therefore, this paper proposes a context aggregation network (CANet). The design of CANet is influenced by advanced technologies such as attention mechanisms and feature fusion and enhancement. This network first introduces nested dilated residual module (NDRM), which can fully utilize the features extracted by the backbone network. Then, improved integrated successive dilation module (IISD) is proposed to effectively aggregate a series of contextual information scales. Next, Swim Transformer module is embedded to provide global contextual information. Finally, multi-resolution fusion module (MRFM) is proposed, allowing the comprehensive fusion of feature layers from different stages of the encoder, preserving more semantic and detailed information. The experimental results show that CANet outperforms other advanced models on the Potsdam and Vaihingen datasets. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
23. Swin-chart: An efficient approach for chart classification.
- Author
-
Dhote, Anurag, Javed, Mohammed, and Doermann, David S.
- Subjects
- *
TRANSFORMER models , *CLASSIFICATION , *DEEP learning , *SCIENCE education - Published
- 2024
- Full Text
- View/download PDF
24. Refined Intelligent Landslide Identification Based on Multi-Source Information Fusion.
- Author
-
Wang, Xiao, Wang, Di, Liu, Chenghao, Zhang, Mengmeng, Xu, Luting, Sun, Tiegang, Li, Weile, Cheng, Sizhi, and Dong, Jianhui
- Subjects
- *
TRANSFORMER models , *EMERGENCY management , *DATABASES , *LANDSLIDES , *REMOTE sensing - Abstract
Landslides are most severe in the mountainous regions of southwestern China. While landslide identification provides a foundation for disaster prevention operations, methods for utilizing multi-source data and deep learning techniques to improve the efficiency and accuracy of landslide identification in complex environments are still a focus of research and a difficult issue in landslide research. In this study, we address the above problems and construct a landslide identification model based on the shifted window (Swin) transformer. We chose Ya'an, which has a complex terrain and experiences frequent landslides, as the study area. Our model, which fuses features from different remote sensing data sources and introduces a loss function that better learns the boundary information of the target, is compared with the pyramid scene parsing network (PSPNet), the unified perception parsing network (UPerNet), and DeepLab_V3+ models in order to explore the learning potential of the model and test the models' resilience in an open-source landslide database. The results show that in the Ya'an landslide database, compared with the above benchmark networks (UPerNet, PSPNet, and DeepLab_v3+), the Swin Transformer-based optimization model improves overall accuracies by 1.7%, 2.1%, and 1.5%, respectively; the F1_score is improved by 14.5%, 16.2%, and 12.4%; and the intersection over union (IoU) is improved by 16.9%, 18.5%, and 14.6%, respectively. The performance of the optimized model is excellent. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
25. Personality prediction via multi-task transformer architecture combined with image aesthetics.
- Author
-
Bajestani, Shahryar Salmani, Khalilzadeh, Mohammad Mahdi, Azarnoosh, Mahdi, and Kobravi, Hamid Reza
- Subjects
- *
TRANSFORMER models , *PERSONALITY , *DATABASES , *STATISTICAL correlation , *SOCIAL media - Abstract
Social media has found its path into the daily lives of people. There are several ways that users communicate in which liking and sharing images stands out. Each image shared by a user can be analyzed from aesthetic and personality traits views. In recent studies, it has been proved that personality traits impact personalized image aesthetics assessment. In this article, the same pattern was studied from a different perspective. So, we evaluated the impact of image aesthetics on personality traits to check if there is any relation between them in this form. Hence, in a two-stage architecture, we have leveraged image aesthetics to predict the personality traits of users. The first stage includes a multi-task deep learning paradigm that consists of an encoder/decoder in which the core of the network is a Swin Transformer. The second stage combines image aesthetics and personality traits with an attention mechanism for personality trait prediction. The results showed that the proposed method had achieved an average Spearman Rank Order Correlation Coefficient (SROCC) of 0.776 in image aesthetic on the Flickr-AES database and an average SROCC of 0.6730 on the PsychoFlickr database, which outperformed related SOTA (State of the Art) studies. The average accuracy performance of the first stage was boosted by 7.02 per cent in the second stage, considering the influence of image aesthetics on personality trait prediction. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
26. Road Surface Condition Monitoring in Extreme Weather Using a Feature-Learning Enhanced Mask–RCNN.
- Author
-
Bai, Zhiyuan, Wang, Yue, Zhang, Ancai, Wei, Hao, and Pan, Guangyuan
- Subjects
- *
EXTREME weather , *PAVEMENTS , *CONVOLUTIONAL neural networks , *ROAD maintenance , *WEATHER - Abstract
Road surface condition (RSC) is an important indicator in road safety studies, enabling transportation departments to employ it for conducting surveys, inspections, cleaning, and maintenance, ultimately contributing to improved performance in road upkeep. However, traditional recognition methods can be easily affected when extreme weather frequently occurs such as winter seasonal changes. To achieve real-time and automatic RSC monitoring, this paper proposes an improved Mask–region-based convolutional neural network (RCNN) based on Swin Transformer-PAFPN and a dynamic head detection network. Meanwhile, transfer learning is used to reduce training time, and data enhancement and multiscale training are applied to achieve better performance. The experimental results show that the proposed model achieves an outstanding mean average precision at 0.5 (mAP@0.5) score of 89.8 under favorable weather conditions characterized by clear visibility, surpassing other popular methods. Notably, the proposed model exhibits lower parameters and GigaFLOPS (GFLOPs) (72.41 and 158.35, respectively) compared to other popular methods, thus demanding fewer computational resources. Furthermore, in challenging weather conditions characterized by poor visibility, such as foggy and nighttime scenarios, the proposed model achieves mAP@0.5 scores of 78.50 and 82.40, respectively. These scores not only outperform those of other popular methods but also underscore the robustness of the proposed model in extreme weather conditions. This exceptional performance demonstrates the proposed model's effectiveness in addressing complex road conditions under various meteorological circumstances, providing reliable technical support for practical traffic monitoring and road maintenance. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
27. Performance of vision transformer and swin transformer models for lemon quality classification in fruit juice factories.
- Author
-
Dümen, Sezer, Kavalcı Yılmaz, Esra, Adem, Kemal, and Avaroglu, Erdinç
- Subjects
- *
TRANSFORMER models , *MACHINE learning , *DEEP learning , *FRUIT juices , *ARTIFICIAL intelligence - Abstract
Assessing the quality of agricultural products holds vital significance in enhancing production efficiency and market viability. The adoption of artificial intelligence (AI) has notably surged for this purpose, employing deep learning and machine learning techniques to process and classify agricultural product images, adhering to defined standards. This study focuses on the lemon dataset, encompassing 'good' and 'bad' quality classes, initiate by augmenting data through rescaling, random zoom, flip, and rotation methods. Subsequently, employing eight diverse deep learning approaches and two transformer methods for classification, the study culminated in the ViT method achieving an unprecedented 99.84% accuracy, 99.95% recall, and 99.66% precision, marking the highest accuracy documented. These findings strongly advocate for the efficacy of the ViT method in successfully classifying lemon quality, spotlighting its potential impact on agricultural quality assessment. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
28. Swin Transformer lightweight: an efficient strategy that combines weight sharing, distillation and pruning.
- Author
-
HAN Bo, ZHOU Shun, FAN Jianhua, WEI Xianglin, HU Yongyang, and ZHU Yanping
- Abstract
Swin Transformer, as a layered visual transformer with shifted windows, has attracted extensive attention in the field of computer vision due to its exceptional modeling capabilities. However, its high computational complexity limits its applicability on devices with constrained computational resources. To address this issue, a pruning compression method was proposed, integrating weight sharing and distillation. Initially, weight sharing was implemented across layers, and transformation layers were added to introduce weight transformation, thereby enhancing diversity. Subsequently, a parameter dependency mapping graph for the transformation blocks was constructed and analyzed, and a grouping matrix F was built to record the dependency relationships among all parameters and identify parameters for simultaneous pruning. Finally, distillation was then employed to restore the model's performance. Experiments conducted on the ImageNet-Tiny-200 public dataset demonstrate that, with a reduction of 32% in model computational complexity, the proposed method only results in approximately a 3% performance degradation at minimum. It provides a solution for deploying high-performance artificial intelligence models in environments with limited computational resources. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
29. A lightweight convolutional swin transformer with cutmix augmentation and CBAM attention for compound emotion recognition.
- Author
-
Nidhi and Verma, Bindu
- Subjects
EMOTION recognition ,TRANSFORMER models ,DATA augmentation ,EMOTIONS ,COMPUTATIONAL complexity - Abstract
Facial emotion recognition has become a complicated task due to individual variations in facial characteristics, as well as racial and cultural variances. Different psychological studies show that there are complex expressions other than basic emotions which are made up of two basic emotions like"Happily Disgusted", "Happily Surprised", "Sadly Surprised", etc. Compound emotion recognition is challenging due to very less publicly available compound emotion datasets which are imbalanced too. In this paper, we have proposed an LSwin-CBAM for the classification of compound emotions. To address the problem of the imbalanced dataset, the proposed model exploits the cutmix augmentation technique for data augmentation. It also incorporates the CBAM attention mechanism to emphasize the relevant features in an image and swin transformer with fewer swin transformer blocks which leads to less computational complexity in terms of trainable parameters and improves the overall classification accuracy as well. The experimental results of LSwin-CBAM on RAF-DB and EmotioNet datasets show that the proposed transformer-based network can well recognize compound emotions. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
30. VDCrackGAN: A Generative Adversarial Network with Transformer for Pavement Crack Data Augmentation.
- Author
-
Yu, Gui, Zhou, Xinglin, and Chen, Xiaolan
- Subjects
TRANSFORMER models ,GENERATIVE adversarial networks ,DATA augmentation ,CRACKING of pavements - Abstract
Addressing the challenge of limited samples arising from the difficulty and high cost of pavement crack, image collecting and labeling, along with the inadequate ability of traditional data augmentation methods to enhance sample feature space, we propose VDCrackGAN, a generative adversarial network combining VAE and DCGAN, specifically tailored for pavement crack data augmentation. Furthermore, spectral normalization is incorporated to enhance the stability of network training, and the self-attention mechanism Swin Transformer is integrated into the network to further improve the quality of crack generation. Experimental outcomes reveal that in comparison to the baseline DCGAN, VDCrackGAN achieves notable improvements of 13.6% and 26.4% in the Inception Score (IS) and Fréchet Inception Distance (FID) metrics, respectively. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
31. Adaptive Attention-Enhanced Yolo for Wall Crack Detection.
- Author
-
Chen, Ying, Wu, Wangyu, and Li, Junxia
- Subjects
TRANSFORMER models ,FEATURE extraction ,NECK ,ATTENTION ,FORECASTING ,DEEP learning - Abstract
With the advancement of social life, the aging of building walls has become an unavoidable phenomenon. Due to the limited efficiency of manually detecting cracks, it is especially necessary to explore intelligent detection techniques. Currently, deep learning has garnered growing attention in crack detection, leading to the development of numerous feature learning methods. Although the technology in this area has been progressing, it still faces problems such as insufficient feature extraction and instability of prediction results. To address the shortcomings in the current research, this paper proposes a new Adaptive Attention-Enhanced Yolo. The method employs a Swin Transformer-based Cross-Stage Partial Bottleneck with a three-convolution structure, introduces an adaptive sensory field module in the neck network, and processes the features through a multi-head attention structure during the prediction process. The introduction of these modules greatly improves the performance of the model, thus effectively improving the precision of crack detection. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
32. Tomato Leaf Disease Classification by Combining EfficientNetv2 and a Swin Transformer.
- Author
-
Sun, Yubing, Ning, Lixin, Zhao, Bin, and Yan, Jun
- Subjects
TRANSFORMER models ,CONVOLUTIONAL neural networks ,PLANT identification ,NOSOLOGY ,TOMATOES - Abstract
Recently, convolutional neural networks (CNNs) and self-attention mechanisms have been widely applied in plant disease identification tasks, yielding significant successes. Currently, the majority of research models for tomato leaf disease recognition rely solely on traditional convolutional models or Transformer architectures and fail to capture both local and global features simultaneously. This limitation may result in biases in the model's focus, consequently impacting the accuracy of disease recognition. Consequently, models capable of extracting local features while attending to global information have emerged as a novel research direction. To address these challenges, we propose an Eff-Swin model that integrates the enhanced features of the EfficientNetV2 and Swin Transformer networks, aiming to harness the local feature extraction capability of CNNs and the global modeling ability of Transformers. Comparative experiments demonstrate that the enhanced model has achieved a further increase in training accuracy, reaching an accuracy rate of 99.70% on the tomato leaf disease dataset, which is 0.49~3.68% higher than that of individual network models and 0.8~1.15% higher than that of existing state-of-the-art combined approaches. The results show that integrating attention mechanisms into convolutional models can significantly enhance the accuracy of tomato leaf disease recognition while also offering the great potential of the Eff-Swin backbone with self-attention in plant disease identification. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
33. CSI-Net: CNN Swin Transformer Integrated Network for Infrared Small Target Detection.
- Author
-
Choi, Lammi, Chung, Won Young, and Park, Chan Gook
- Abstract
In the realm of infrared (IR) small target detection, pinpointing blurry and low-contrast targets accurately is immensely challenging due to the intricate features of IR images. To tackle this, we introduce CSI-Net, a novel network architecture merging CNN and swin transformer. CSI-Net features a hybrid encoder design, blending encoder-decoder layout of UNet with swin transformer's parallel execution alongside CNN. This amalgamation enables the network to capture local features and long-distance dependencies, enhancing its ability to accurately identify small targets. Leveraging hierarchical features of swin transformer, CSI-Net adeptly grasps contextual information crucial for small target detection. Moreover, CSI-Net employs full-scale skip connections over encoder-decoder and decoder-decoder, integrating multiscale CNN and swin transformer features to improve gradient propagation. Experimental results validate superiority of proposed method over traditional CNN and Transformer methods. At NUAA-SIRST, metrics like mIoU (0.7483), detection probability (0.9734), and false alarm rates (0.101 × 10
−5 ) demonstrate significant improvement. Similarly, at NUDT-SIRST, values like mIoU (0.8887), detection probability (0.9894), and false alarm rates (0.431 × 10−5 ) show notable enhancement. The performance of network scales with dataset size, and its robustness is affirmed by the area under the ROC curve (AUC). Additionally, an ablation study validates the efficacy of hybrid encoder. Varying the presence of the parallel swin transformer module (PSM) reveals that its application enhances small target detection performance. The comprehensive evaluation shows that the swin transformer-enhanced UNet architecture effectively tackles the challenges of IR small target detection. [ABSTRACT FROM AUTHOR]- Published
- 2024
- Full Text
- View/download PDF
34. A novel Swin transformer approach utilizing residual multi-layer perceptron for diagnosing brain tumors in MRI images.
- Author
-
Pacal, Ishak
- Abstract
Serious consequences due to brain tumors necessitate a timely and accurate diagnosis. However, obstacles such as suboptimal imaging quality, issues with data integrity, varying tumor types and stages, and potential errors in interpretation hinder the achievement of precise and prompt diagnoses. The rapid identification of brain tumors plays a pivotal role in ensuring patient safety. Deep learning-based systems hold promise in aiding radiologists to make diagnoses swiftly and accurately. In this study, we present an advanced deep learning approach based on the Swin Transformer. The proposed method introduces a novel Hybrid Shifted Windows Multi-Head Self-Attention module (HSW-MSA) along with a rescaled model. This enhancement aims to improve classification accuracy, reduce memory usage, and simplify training complexity. The Residual-based MLP (ResMLP) replaces the traditional MLP in the Swin Transformer, thereby improving accuracy, training speed, and parameter efficiency. We evaluate the Proposed-Swin model on a publicly available brain MRI dataset with four classes, using only test data. Model performance is enhanced through the application of transfer learning and data augmentation techniques for efficient and robust training. The Proposed-Swin model achieves a remarkable accuracy of 99.92%, surpassing previous research and deep learning models. This underscores the effectiveness of the Swin Transformer with HSW-MSA and ResMLP improvements in brain tumor diagnosis. This method introduces an innovative diagnostic approach using HSW-MSA and ResMLP in the Swin Transformer, offering potential support to radiologists in timely and accurate brain tumor diagnosis, ultimately improving patient outcomes and reducing risks. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
35. Deep Learning for Multilabel Classification of Coral Reef Conditions in the Indo‐Pacific Using Underwater Photo Transect Method.
- Author
-
Shao, Xinlei, Chen, Hongruixuan, Magson, Kirsty, Wang, Jiaqi, Song, Jian, Chen, Jundong, and Sasaki, Jun
- Subjects
CORAL reef conservation ,CORAL reefs & islands ,TRANSFORMER models ,TRANSECT method ,ENVIRONMENTAL monitoring - Abstract
Because coral reef ecosystems face threats from human activities and climate change, coral reef conservation programmes are implemented worldwide. Monitoring coral health provides references for guiding conservation activities. However, current labour‐intensive methods result in a backlog of unsorted images, highlighting the need for automated classification. Few studies have simultaneously utilized accurate labels along with updated algorithms and datasets. This study aimed to create a dataset representing common coral reef conditions and associated stressors in the Indo‐Pacific. Concurrently, it assessed existing classification algorithms and proposed a new multilabel method for automatically detecting coral reef conditions and extracting ecological information. A dataset containing over 20,000 high‐resolution coral images of different health conditions and stressors was constructed based on the field survey. Seven representative deep learning architectures were tested on this dataset, and their performance was quantitatively evaluated using the F1 metric and the match ratio. Based on this evaluation, a new method utilizing the ensemble learning approach was proposed. The proposed method accurately classified coral reef conditions as healthy, compromised, dead and rubble; it also identified corresponding stressors, including competition, disease, predation and physical issues. This method can help develop the coral image archive, guide conservation activities and provide references for decision‐making for reef managers and conservationists. The proposed ensemble learning approach outperforms others on the dataset, showing state‐of‐the‐art (SOTA) performance. Future research should improve its generalizability and accuracy to support global coral reef conservation efforts. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
36. Multi-label dental disorder diagnosis based on MobileNetV2 and swin transformer using bagging ensemble classifier
- Author
-
Yasmin M. Alsakar, Naira Elazab, Nermeen Nader, Waleed Mohamed, Mohamed Ezzat, and Mohammed Elmogy
- Subjects
Dentistry ,MobileNetV2 ,Swin transformer ,Annotation ,Deep learning ,Feature extraction ,Medicine ,Science - Abstract
Abstract Dental disorders are common worldwide, causing pain or infections and limiting mouth opening, so dental conditions impact productivity, work capability, and quality of life. Manual detection and classification of oral diseases is time-consuming and requires dentists’ evaluation and examination. The dental disease detection and classification system based on machine learning and deep learning will aid in early dental disease diagnosis. Hence, this paper proposes a new diagnosis system for dental diseases using X-ray imaging. The framework includes a robust pre-processing phase that uses image normalization and adaptive histogram equalization to improve image quality and reduce variation. A dual-stream approach is used for feature extraction, utilizing the advantages of Swin Transformer for capturing long-range dependencies and global context and MobileNetV2 for effective local feature extraction. A thorough representation of dental anomalies is produced by fusing the extracted features. To obtain reliable and broadly applicable classification results, a bagging ensemble classifier is utilized in the end. We evaluate our model on a benchmark dental radiography dataset. The experimental results and comparisons show the superiority of the proposed system with 95.7% for precision, 95.4% for sensitivity, 95.7% for specificity, 95.5% for Dice similarity coefficient, and 95.6% for accuracy. The results demonstrate the effectiveness of our hybrid model integrating MoileNetv2 and Swin Transformer architectures, outperforming state-of-the-art techniques in classifying dental diseases using dental panoramic X-ray imaging. This framework presents a promising method for robustly and accurately diagnosing dental diseases automatically, which may help dentists plan treatments and identify dental diseases early on.
- Published
- 2024
- Full Text
- View/download PDF
37. DS-TransFusion: Automatic retinal vessel segmentation based on an improved Swin Transformer
- Author
-
Benchen YANG, Jianyu WANG, and Haibo JIN
- Subjects
retinal vascular segmentation ,fundus images ,multi scale attention ,feature fusion ,swin transformer ,Mining engineering. Metallurgy ,TN1-997 ,Environmental engineering ,TA170-171 - Abstract
Retinal vascular segmentation holds significant value in medical research, playing an indispensable role in facilitating the screening of various diseases, such as diabetes, hypertension, and glaucoma. However, most current retinal vessel segmentation methods mainly rely on convolutional neural networks, which present limitations when dealing with long-term dependencies and global context connections. These limitations often result in poor segmentation of small blood vessels and low contrast between the ends of fundus blood vessel branches and the background. Addressing these issues is a pressing concern. To tackle these challenges, this paper proposes a new retinal blood vessel segmentation model, namely Dual Swin Transformer Fusion (DS-TransFusion). This model uses a two-scale encoder subnetwork based on a Swin Transformer, which is able to find correspondence and align features from heterogeneous inputs. Given an input image of a retinal blood vessel, the model first splits it into two nonoverlapping blocks of different sizes. These are then fed into the two branches of the encoder to extract coarse-grained and fine-grained features of the retinal blood vessels. At the jump junction, DS-TransFusion introduces the Transformer interactive fusion attention (TIFA) module. The core of this module is to use a multiscale attention (MA) mechanism to facilitate efficient interaction between multiscale features. It integrates features from two branches at different scales, achieves effective feature fusion, enriches cross-view context modeling and semantic dependency, and captures long-term correlations between data from different image views. This, in turn, enhances segmentation performance. In addition, to integrate multiscale representation in the hierarchical backbone, DS-TransFusion introduces an MA module between the encoder and decoder. This module learns the feature dependencies across different scales, collects the global correspondence of multiscale feature representations, and further optimizes the segmentation effect of the model. The results showed that DS-TransFusion performed impressively on public data sets STARE, CHASEDB1, and DRIVE, with accuracies of 96.50%, 97.22%, and 97.80%, and sensitivities of 84.10%, 84.55%, and 83.17%, respectively. Experimental results show that DS-TransFusion can effectively improve the accuracy of retinal blood vessel segmentation and accurately segment small blood vessels. Overall, DS-TransFusion, as a Swin Transformer-based retinal vessel segmentation model, has achieved remarkable results in solving the problems of unclear segmentation of small vessels and global context connection. Experimental results on several public data sets have validated the effectiveness and superiority of this method, suggesting its potential to provide more accurate retinal vascular segmentation results for auxiliary screening of various diseases.
- Published
- 2024
- Full Text
- View/download PDF
38. Ensemble of vision transformer architectures for efficient Alzheimer’s Disease classification
- Author
-
Noushath Shaffi, Vimbi Viswan, and Mufti Mahmud
- Subjects
Vision transformer ,Convolutional neural networks ,Machine learning models ,Alzheimer’s Disease ,Swin transformer ,Data efficient image transformers ,Computer applications to medicine. Medical informatics ,R858-859.7 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer’s Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models’ efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available.
- Published
- 2024
- Full Text
- View/download PDF
39. SwinVNETR: Swin V-net Transformer with non-local block for volumetric MRI Brain Tumor Segmentation
- Author
-
Maria Nancy A and K. Sathyarajasekaran
- Subjects
Deep learning ,Swin Transformer ,brain tumour segmentation ,non-local block ,explainable AI ,Grad-CAM ,Control engineering systems. Automatic machinery (General) ,TJ212-225 ,Automation ,T59.5 - Abstract
Brain Tumor Segmentation (BTS) and classification are important and growing research fields. Magnetic resonance imaging (MRI) is commonly used in the diagnosis of brain tumours owing to its low radiation exposure and high image quality. One of the current subjects in the field of medical imaging is how to quickly and precisely segment MRI scans of brain tumours. Unfortunately, most existing brain tumour segmentation algorithms use inadequate 2D picture segmentation methods and fail to capture the spatial correlation between features. In this study, we propose a segmentation model (SwinVNETR) Swin V-NetTRansformer-based architecture with a non-local block. This model was trained using the Brain Tumor Segmentation Challenge BraTS 2021 dataset. The Dice similarity coefficients for the enhanced tumour (ET), whole tumour (WT), and tumour core (TC) are 0.84, 0.91, and 0.87, respectively. By leveraging this methodology, we can segment brain tumours more accurately than ever before. In conclusion, we present the findings of our model through the application of the Grad-CAM methodology, an eXplainable Artificial Intelligence (XAI) technique utilized to elucidate the insights derived from the model, which helped in better understanding; doctors can better diagnose and treat patients with brain tumours.
- Published
- 2024
- Full Text
- View/download PDF
40. Improved YOLOv5m model based on Swin Transformer, K-means++, and Efficient Intersection over Union (EIoU) loss function for cocoa tree (Theobroma cacao) disease detection
- Author
-
Benedicta Nana Esi Nyarko, Wu Bin Wu, Zhou Jinzhi Zhou, and Mwanaharusi Mohd Juma Mohd
- Subjects
cocoa tree disease ,eiou ,k-means++ ,plant disease detection ,swin transformer ,Plant culture ,SB1-1110 - Abstract
The cocoa tree is prone to diverse diseases such as stem borer, stem canker, swollen shot, and root rot disease which impedes high yield. Early disease detection is a critical component of diverse management processes that are implemented throughout the life cycle of cocoa plants. Consequently, several studies on the application of detection techniques to recognize diseases have been proposed by several researchers. This study proposes the YOLOv5m network for cocoa tree disease detection. The development of cocoa disease detection systems will aid farmers in early identification prompt response, and efficient management of related cocoa tree diseases which will ultimately increase yield and sustainability. To improve the performance of the YOLOv5m network, a Swin Transformer (Swin-T) was added to the backbone network to improve cocoa tree disease detection accuracy. By obtaining global information, the K-means++ algorithm was added to modify the choice of initial clustering locations, and Efficient Intersection over Union Loss (EIoU) was used as a bounding box regression loss function to speed up the bounding box regression rate, resulting in higher precision of the YOLOv5m network. The experimental assessment outcome of this study showed that the proposed method YOLOv5m (Swin-T, K-means++, EIoU) achieved 96% precision, mAP of 92%, and recall of 94%. Compared to the original YOLOv5m, precision improved by 5%, mAP improved by 6%, and recall by 5%. Comparing the proposed method to the conventional YOLOv5m, the latter showed improved performance and better accuracy with a high detection speed and compactness. This improvement offers a useful and effective method for detecting diseases related to cocoa trees.
- Published
- 2024
- Full Text
- View/download PDF
41. Retinex decomposition based low‐light image enhancement by integrating Swin transformer and U‐Net‐like architecture
- Author
-
Zexin Wang, Letu Qingge, Qingyi Pan, and Pei Yang
- Subjects
low‐light image enhancement ,residual connection ,swin transformer ,U‐Net ,Photography ,TR1-1050 ,Computer software ,QA76.75-76.765 - Abstract
Abstract Low‐light images are captured in environments with minimal lighting, such as nighttime or underwater conditions. These images often suffer from issues like low brightness, poor contrast, lack of detail, and overall darkness, significantly impairing human visual perception and subsequent high‐level visual tasks. Enhancing low‐light images holds great practical significance. Among the various existing methods for Low‐Light Image Enhancement (LLIE), those based on the Retinex theory have gained significant attention. However, despite considerable efforts in prior research, the challenge of Retinex decomposition remains unresolved. In this study, an LLIE network based on the Retinex theory is proposed, which addresses these challenges by integrating attention mechanisms and a U‐Net‐like architecture. The proposed model comprises three modules: the Decomposition module (DECM), the Reflectance Recovery module (REFM), and the Illumination Enhancement module (ILEM). Its objective is to decompose low‐light images based on the Retinex theory and enhance the decomposed reflectance and illumination maps using attention mechanisms and a U‐Net‐like architecture. We conducted extensive experiments on several widely used public datasets. The qualitative results demonstrate that the approach produces enhanced images with superior visual quality compared to the existing methods on all test datasets, especially for some extremely dark images. Furthermore, the quantitative evaluation results based on metrics PSNR, SSIM, LPIPS, BRISQUE, and MUSIQ show the proposed model achieves superior performance, with PSNR and BRISQUE significantly outperforming the baseline approaches, where (PSNR, mean BRISQUE) values of the proposed method and the second best results are (17.14, 17.72) and (16.44, 19.65). Additionally, further experimental results such as ablation studies indicate the effectiveness of the proposed model.
- Published
- 2024
- Full Text
- View/download PDF
42. MoE-NuSeg: Enhancing nuclei segmentation in histology images with a two-stage Mixture of Experts network
- Author
-
Xuening Wu, Yiqing Shen, Qing Zhao, Yanlan Kang, and Wenqiang Zhang
- Subjects
Nuclei segmentation ,Histology image ,Swin transformer ,Mixture of Experts (moEs) ,Engineering (General). Civil engineering (General) ,TA1-2040 - Abstract
Accurate nuclei segmentation is essential for extracting quantitative information from histology images to support disease diagnosis and treatment decisions. However, precise segmentation is challenging due to the presence of clustered nuclei, varied morphologies, and the need to capture global spatial correlations. While state-of-the-art Transformer-based models employ tri-decoder architectures to decouple the segmentation task into nuclei, edges, and cluster edges segmentation, their complexity and long inference times hinder clinical integration. To address this, we introduce MoE-NuSeg, a novel Mixture of Experts (MoE) network that consolidates the tri-decoder into a single decoder. MoE-NuSeg employs three specialized experts for nuclei segmentation, edge delineation, and cluster edge detection, thereby mirroring the functionality of tri-decoders while surpassing their performance and reducing parameters by sharing attention heads. We propose a two-stage training strategy: the first stage independently trains the three experts, and the second stage fine-tunes their interactions to dynamically allocate the contributions of each expert using a learnable attention-based gating network. Evaluations across three datasets demonstrate that MoE-NuSeg outperforms the state-of-the-art methods, achieving an average increase of 0.99% in Dice coefficient, 1.14% in IoU and 0.92% in F1 Score, while reducing parameters by 30.1% and FLOPs by 40.2%. The code is available at https://github.com/deep-geo/MoE-NuSeg.
- Published
- 2025
- Full Text
- View/download PDF
43. Enhancing Melanoma Diagnosis with Advanced Deep Learning Models Focusing on Vision Transformer, Swin Transformer, and ConvNeXt
- Author
-
Serra Aksoy, Pinar Demircioglu, and Ismail Bogrekci
- Subjects
ViT ,Swin Transformer ,ConvNeXt ,benign and malignant tumors ,medical imaging ,Dermatology ,RL1-803 - Abstract
Skin tumors, especially melanoma, which is highly aggressive and progresses quickly to other sites, are an issue in various parts of the world. Nevertheless, the one and only way to save lives is to detect it at its initial stages. This study explores the application of advanced deep learning models for classifying benign and malignant melanoma using dermoscopic images. The aim of the study is to enhance the accuracy and efficiency of melanoma diagnosis with the ConvNeXt, Vision Transformer (ViT) Base-16, and Swin Transformer V2 Small (Swin V2 S) deep learning models. The ConvNeXt model, which integrates principles of both convolutional neural networks and transformers, demonstrated superior performance, with balanced precision and recall metrics. The dataset, sourced from Kaggle, comprises 13,900 uniformly sized images, preprocessed to standardize the inputs for the models. Experimental results revealed that ConvNeXt achieved the highest diagnostic accuracy among the tested models. Experimental results revealed that ConvNeXt achieved an accuracy of 91.5%, with balanced precision and recall rates of 90.45% and 92.8% for benign cases, and 92.61% and 90.2% for malignant cases, respectively. The F1-scores for ConvNeXt were 91.61% for benign cases and 91.39% for malignant cases. This research points out the potential of hybrid deep learning architectures in medical image analysis, particularly for early melanoma detection.
- Published
- 2024
- Full Text
- View/download PDF
44. MSCAC: A Multi-Scale Swin–CNN Framework for Progressive Remote Sensing Scene Classification
- Author
-
A. Arun Solomon and S. Akila Agnes
- Subjects
aerial scene classification ,Swin Transformer ,deep learning in remote sensing ,geospatial analysis ,computer vision for aerial imagery ,terrain mapping ,Geography (General) ,G1-922 - Abstract
Recent advancements in deep learning have significantly improved the performance of remote sensing scene classification, a critical task in remote sensing applications. This study presents a new aerial scene classification model, the Multi-Scale Swin–CNN Aerial Classifier (MSCAC), which employs the Swin Transformer, an advanced architecture that has demonstrated exceptional performance in a range of computer vision applications. The Swin Transformer leverages shifted window mechanisms to efficiently model long-range dependencies and local features in images, making it particularly suitable for the complex and varied textures in aerial imagery. The model is designed to capture intricate spatial hierarchies and diverse scene characteristics at multiple scales. A framework is developed that integrates the Swin Transformer with a multi-scale strategy, enabling the extraction of robust features from aerial images of different resolutions and contexts. This approach allows the model to effectively learn from both global structures and fine-grained details, which is crucial for accurate scene classification. The model’s performance is evaluated on several benchmark datasets, including UC-Merced, WHU-RS19, RSSCN7, and AID, where it demonstrates a superior or comparable accuracy to state-of-the-art models. The MSCAC model’s adaptability to varying amounts of training data and its ability to improve with increased data make it a promising tool for real-world remote sensing applications. This study highlights the potential of integrating advanced deep-learning architectures like the Swin Transformer into aerial scene classification, paving the way for more sophisticated and accurate remote sensing systems. The findings suggest that the proposed model has significant potential for various remote sensing applications, including land cover mapping, urban planning, and environmental monitoring.
- Published
- 2024
- Full Text
- View/download PDF
45. Dual-scale shifted window attention network for medical image segmentation
- Author
-
De-wei Han, Xiao-lei Yin, Jian Xu, Kang Li, Jun-jie Li, Lu Wang, and Zhao-yuan Ma
- Subjects
Swin Transformer ,Dual-scale shifted window attention ,Medical image segmentation ,Medicine ,Science - Abstract
Abstract Swin Transformer is an important work among all the attempts to reduce the computational complexity of Transformers while maintaining its excellent performance in computer vision. Window-based patch self-attention can use the local connectivity of the image features, and the shifted window-based patch self-attention enables the communication of information between different patches in the entire image scope. Through in-depth research on the effects of different sizes of shifted windows on the patch information communication efficiency, this article proposes a Dual-Scale Transformer with double-sized shifted window attention method. The proposed method surpasses CNN-based methods such as U-Net, AttenU-Net, ResU-Net, CE-Net by a considerable margin (Approximately 3% $$\sim$$ ∼ 6% increase), and outperforms the Transformer based models single-scale Swin Transformer(SwinT)(Approximately 1% increase), on the datasets of the Kvasir-SEG, ISIC2017, MICCAI EndoVisSub-Instrument and CadVesSet. The experimental results verify that the proposed dual scale shifted window attention benefits the communication of patch information and can enhance the segmentation results to state of the art. We also implement an ablation study on the effect of the shifted window size on the information flow efficiency and verify that the dual-scale shifted window attention is the optimized network design. Our study highlights the significant impact of network structure design on visual performance, providing valuable insights for the design of networks based on Transformer architectures.
- Published
- 2024
- Full Text
- View/download PDF
46. Brain Age Estimation from Overnight Sleep Electroencephalography with Multi-Flow Sequence Learning
- Author
-
Zhang D, She Y, Sun J, Cui Y, Yang X, Zeng X, and Qin W
- Subjects
brain age ,sleep polysomnography ,electroencephalography ,deep learning ,swin transformer ,Psychiatry ,RC435-571 ,Neurophysiology and neuropsychology ,QP351-495 - Abstract
Di Zhang,1,2 Yichong She,1,2 Jinbo Sun,1,2 Yapeng Cui,1,2 Xuejuan Yang,1,2 Xiao Zeng,1,2 Wei Qin1,2 1Engineering Research Center of Molecular and Neuro Imaging of the Ministry of Education, School of Life Science and Technology, Xidian University, Xi’an, Shaanxi, 710126, People’s Republic of China; 2Intelligent Non-Invasive Neuromodulation Technology and Transformation Joint Laboratory, Xidian University, Xi’an, People’s Republic of ChinaCorrespondence: Xiao Zeng, Xidian University, 266 Xifeng Road, Xinglong, Xi’an, Shaanxi, People’s Republic of China, 710126, Email xiaozeng1024@gmail.comPurpose: This study aims to improve brain age estimation by developing a novel deep learning model utilizing overnight electroencephalography (EEG) data.Methods: We address limitations in current brain age prediction methods by proposing a model trained and evaluated on multiple cohort data, covering a broad age range. The model employs a one-dimensional Swin Transformer to efficiently extract complex patterns from sleep EEG signals and a convolutional neural network with attentional mechanisms to summarize sleep structural features. A multi-flow learning-based framework attentively merges these two features, employing sleep structural information to direct and augment the EEG features. A post-prediction model is designed to integrate the age-related features throughout the night. Furthermore, we propose a DecadeCE loss function to address the problem of an uneven age distribution.Results: We utilized 18,767 polysomnograms (PSGs) from 13,616 subjects to develop and evaluate the proposed model. The model achieves a mean absolute error (MAE) of 4.19 and a correlation of 0.97 on the mixed-cohort test set, and an MAE of 6.18 years and a correlation of 0.78 on an independent test set. Our brain age estimation work reduced the error by more than 1 year compared to other studies that also used EEG, achieving the level of neuroimaging. The estimated brain age index demonstrated longitudinal sensitivity and exhibited a significant increase of 1.27 years in individuals with psychiatric or neurological disorders relative to healthy individuals.Conclusion: The multi-flow deep learning model proposed in this study, based on overnight EEG, represents a more accurate approach for estimating brain age. The utilization of overnight sleep EEG for the prediction of brain age is both cost-effective and adept at capturing dynamic changes. These findings demonstrate the potential of EEG in predicting brain age, presenting a noninvasive and accessible method for assessing brain aging.Keywords: brain age, sleep polysomnography, electroencephalography, deep learning, swin transformer
- Published
- 2024
47. A dual-track feature fusion model utilizing Group Shuffle Residual DeformNet and swin transformer for the classification of grape leaf diseases
- Author
-
R. Karthik, Gadige Vishnu Vardhan, Shreyansh Khaitan, R. N. R. Harisankar, R. Menaka, Sindhia Lingaswamy, and Daehan Won
- Subjects
Grape leaf disease ,Deep learning ,Dual-track network ,Swin transformer ,Triplet attention ,Medicine ,Science - Abstract
Abstract Grape cultivation is important globally, contributing to the agricultural economy and providing diverse grape-based products. However, the susceptibility of grapes to disease poses a significant threat to yield and quality. Traditional disease identification methods demand expert knowledge, which limits scalability and efficiency. To address these limitations our research aims to design an automated deep learning approach for grape leaf disease detection. This research introduces a novel dual-track network for classifying grape leaf diseases, employing a combination of the Swin Transformer and Group Shuffle Residual DeformNet (GSRDN) tracks. The Swin Transformer track exploits shifted window techniques to construct hierarchical feature maps, enhancing global feature extraction. Simultaneously, the GSRDN track combines Group Shuffle Depthwise Residual block and Deformable Convolution block to extract local features with reduced computational complexity. The features from both tracks are concatenated and processed through Triplet Attention for cross-dimensional interaction. The proposed model achieved an accuracy of 98.6%, the precision, recall, and F1-score are recorded as 98.7%, 98.59%, and 98.64%, respectively as validated on a dataset containing grape leaf disease information from the PlantVillage dataset, demonstrating its potential for efficient grape disease classification.
- Published
- 2024
- Full Text
- View/download PDF
48. Field cabbage detection and positioning system based on improved YOLOv8n
- Author
-
Ping Jiang, Aolin Qi, Jiao Zhong, Yahui Luo, Wenwu Hu, Yixin Shi, and Tianyu Liu
- Subjects
Cabbage ,Object detection ,YOLOv8n ,Swin transformer ,Large kernel convolutions ,Plant culture ,SB1-1110 ,Biology (General) ,QH301-705.5 - Abstract
Abstract Background Pesticide efficacy directly affects crop yield and quality, making targeted spraying a more environmentally friendly and effective method of pesticide application. Common targeted cabbage spraying methods often involve object detection networks. However, complex natural and lighting conditions pose challenges in the accurate detection and positioning of cabbage. Results In this study, a cabbage detection algorithm based on the YOLOv8n neural network (YOLOv8-cabbage) combined with a positioning system constructed using a Realsense depth camera is proposed. Initially, four of the currently available high-performance object detection models were compared, and YOLOv8n was selected as the transfer learning model for field cabbage detection. Data augmentation and expansion methods were applied to extensively train the model, a large kernel convolution method was proposed to improve the bottleneck section, the Swin transformer module was combined with the convolutional neural network (CNN) to expand the perceptual field of feature extraction and improve edge detection effectiveness, and a nonlocal attention mechanism was added to enhance feature extraction. Ablation experiments were conducted on the same dataset under the same experimental conditions, and the improved model increased the mean average precision (mAP) from 88.8% to 93.9%. Subsequently, depth maps and colour maps were aligned pixelwise to obtain the three-dimensional coordinates of the cabbages via coordinate system conversion. The positioning error of the three-dimensional coordinate cabbage identification and positioning system was (11.2 mm, 10.225 mm, 25.3 mm), which meets the usage requirements. Conclusions We have achieved accurate cabbage positioning. The object detection system proposed here can detect cabbage in real time in complex field environments, providing technical support for targeted spraying applications and positioning.
- Published
- 2024
- Full Text
- View/download PDF
49. SwinDFU-Net: Deep learning transformer network for infection identification in diabetic foot ulcer.
- Author
-
M.G, Sumithra and Venkatesan, Chandran
- Subjects
- *
CONVOLUTIONAL neural networks , *TRANSFORMER models , *DIABETIC foot , *DEEP learning , *IMAGE recognition (Computer vision) - Abstract
The identification of infection in diabetic foot ulcers (DFUs) is challenging due to variability within classes, visual similarity between classes, reduced contrast with healthy skin, and presence of artifacts. Existing studies focus on visual characteristics and tissue classification rather than infection detection, critical for assessing DFUs and predicting amputation risk. To address these challenges, this study proposes a deep learning model using a hybrid CNN and Swin Transformer architecture for infection classification in DFU images. The aim is to leverage end-to-end mapping without prior knowledge, integrating local and global feature extraction to improve detection accuracy. The proposed model utilizes a hybrid CNN and Swin Transformer architecture. It employs the Grad CAM technique to visualize the decision-making process of the CNN and Transformer blocks. The DFUC Challenge dataset is used for training and evaluation, emphasizing the model’s ability to accurately classify DFU images into infected and non-infected categories. The model achieves high performance metrics: sensitivity (95.98%), specificity (97.08%), accuracy (96.52%), and Matthews Correlation Coefficient (0.93). These results indicate the model’s effectiveness in quickly diagnosing DFU infections, highlighting its potential as a valuable tool for medical professionals. The hybrid CNN and Swin Transformer architecture effectively combines strengths from both models, enabling accurate classification of DFU images as infected or non-infected, even in complex scenarios. The use of Grad CAM provides insights into the model’s decision process, aiding in identifying infected regions within DFU images. This approach shows promise for enhancing clinical assessment and management of DFU infections. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
50. ACSwinNet: A Deep Learning-Based Rigid Registration Method for Head-Neck CT-CBCT Images in Image-Guided Radiotherapy.
- Author
-
Peng, Kuankuan, Zhou, Danyu, Sun, Kaiwen, Wang, Junfeng, Deng, Jianchun, and Gong, Shihua
- Subjects
- *
CONE beam computed tomography , *TRANSFORMER models , *IMAGE-guided radiation therapy , *COMPUTED tomography , *NECK tumors - Abstract
Accurate and precise rigid registration between head-neck computed tomography (CT) and cone-beam computed tomography (CBCT) images is crucial for correcting setup errors in image-guided radiotherapy (IGRT) for head and neck tumors. However, conventional registration methods that treat the head and neck as a single entity may not achieve the necessary accuracy for the head region, which is particularly sensitive to radiation in radiotherapy. We propose ACSwinNet, a deep learning-based method for head-neck CT-CBCT rigid registration, which aims to enhance the registration precision in the head region. Our approach integrates an anatomical constraint encoder with anatomical segmentations of tissues and organs to enhance the accuracy of rigid registration in the head region. We also employ a Swin Transformer-based network for registration in cases with large initial misalignment and a perceptual similarity metric network to address intensity discrepancies and artifacts between the CT and CBCT images. We validate the proposed method using a head-neck CT-CBCT dataset acquired from clinical patients. Compared with the conventional rigid method, our method exhibits lower target registration error (TRE) for landmarks in the head region (reduced from 2.14 ± 0.45 mm to 1.82 ± 0.39 mm), higher dice similarity coefficient (DSC) (increased from 0.743 ± 0.051 to 0.755 ± 0.053), and higher structural similarity index (increased from 0.854 ± 0.044 to 0.870 ± 0.043). Our proposed method effectively addresses the challenge of low registration accuracy in the head region, which has been a limitation of conventional methods. This demonstrates significant potential in improving the accuracy of IGRT for head and neck tumors. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF
Catalog
Discovery Service for Jio Institute Digital Library
For full access to our library's resources, please sign in.