Descriptor: "swin transformer" - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"swin transformer"' showing total 1,219 results

Start Over Descriptor "swin transformer"

1,219 results on '"swin transformer"'

1. Swin-MSTP: Swin transformer with multi-scale temporal perception for continuous sign language recognition

Author: Alyami, Sarah and Luqman, Hamzah
Published: 2025
Full Text: View/download PDF

2. DeepUTF: Locating transcription factor binding sites via interpretable dual-channel encoder-decoder structure

Author: Ding, Pengju, Wang, Jianxin, He, Shiyue, Gao, Xin, Yu, Xu, and Yu, Bin
Published: 2025
Full Text: View/download PDF

3. Domain Aware Multi-task Pretraining of 3D Swin Transformer for T1-Weighted Brain MRI

Author: Kim, Jonghun, Kim, Mansu, Park, Hyunjin, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Cho, Minsu, editor, Laptev, Ivan, editor, Tran, Du, editor, Yao, Angela, editor, and Zha, Hongbin, editor
Published: 2025
Full Text: View/download PDF

4. SwinDehazing: Haze Removal Using U-Net and Swin Transformer

Author: Maldonado-Quispe, Percy, Pedrini, Helio, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Hernández-García, Ruber, editor, Barrientos, Ricardo J., editor, and Velastin, Sergio A., editor
Published: 2025
Full Text: View/download PDF

5. Classification of Endoscopy and Video Capsule Images Using CNN-Transformer Model

Author: Subedi, Aliza, Regmi, Smriti, Regmi, Nisha, Bhusal, Bhumi, Bagci, Ulas, Jha, Debesh, Goos, Gerhard, Series Editor, Hartmanis, Juris, Founding Editor, Bertino, Elisa, Editorial Board Member, Gao, Wen, Editorial Board Member, Steffen, Bernhard, Editorial Board Member, Yung, Moti, Editorial Board Member, Ali, Sharib, editor, van der Sommen, Fons, editor, Papież, Bartłomiej Władysław, editor, Ghatwary, Noha, editor, Jin, Yueming, editor, and Kolenbrander, Iris, editor
Published: 2025
Full Text: View/download PDF

6. Agent-SwinPyramidNet: an enhanced deep learning model with AMTCF-VMD for anomaly detection in oil and gas pipelines

Author: Zhang, Yizhuo, Zhang, Yunfei, Yu, Huiling, and Shi, Shen
Published: 2024
Full Text: View/download PDF

7. Facial Anti-Spoofing Using "Clue Maps".

Author: Gong, Liang Yu, Li, Xue Jun, and Chong, Peter Han Joo
Abstract: Spoofing attacks (or Presentation Attacks) are easily accessible to facial recognition systems, making the online financial system vulnerable. Thus, it is urgent to develop an anti-spoofing solution with superior generalization ability due to the high demand for spoofing attack detection. Although multi-modality methods such as combining depth images with RGB images and feature fusion methods could currently perform well with certain datasets, the cost of obtaining the depth information and physiological signals, especially that of the biological signal is relatively high. This paper proposes a representation learning method of an Auto-Encoder structure based on Swin Transformer and ResNet, then applies cross-entropy loss, semi-hard triplet loss, and Smooth L1 pixel-wise loss to supervise the model training. The architecture contains three parts, namely an Encoder, a Decoder, and an auxiliary classifier. The Encoder part could effectively extract the features with patches' correlations and the Decoder aims to generate universal "Clue Maps" for further contrastive learning. Finally, the auxiliary classifier is adopted to assist the model in making the decision, which regards this result as one preliminary result. In addition, extensive experiments evaluated Attack Presentation Classification Error Rate (APCER), Bonafide Presentation Classification Error Rate (BPCER) and Average Classification Error Rate (ACER) performances on the popular spoofing databases (CelebA, OULU, and CASIA-MFSD) to compare with several existing anti-spoofing models, and our approach could outperform existing models which reach 1.2% and 1.6% ACER on intra-dataset experiment. In addition, the inter-dataset on CASIA-MFSD (training set) and Replay-attack (Testing set) reaches a new state-of-the-art performance with 23.8% Half Total Error Rate (HTER). [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. DEST: Difference enhanced-Swin Transformer for remote sensing change detection.

Author: Wang, Xin, Zeng, Zeyang, and Li, Li
Subjects: *TRANSFORMER models, *DEEP learning, *CONVOLUTIONAL neural networks, *REMOTE sensing, *FEATURE extraction
Abstract: Remote sensing (RS) change detection (CD) has recently achieved remarkable success thanks to convolutional neural networks (CNNs). However, due to the limited receptive fields of CNN models, existing methods are prone to generating pseudo detections or missed detections. In this letter, we propose a difference enhanced-Swin Transformer network (DEST) for accurate and robust change detection in RS images. First, we design a difference enhancement module (DEM) in the feature extraction stage to boost the feature learning of differences for dual-temporal images at each level. Second, to enlarge the receptive fields of networks and capture more changed details, we apply a Swin Transformer module to the difference features to model the global contextual information. Third, to avoid the semantic loss and simultaneously solve the problem of uneven contributions of features at different levels, we design a feature weight fusion module (FWFM) to effectively aggregate multi-level feature difference maps. Extensive experimental results on two publicly available benchmarks demonstrate that the proposed method is superior to some state-of-the-art change detection models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. SwinVNETR: Swin V-net Transformer with non-local block for volumetric MRI Brain Tumor Segmentation.

Author: A, Maria Nancy and Sathyarajasekaran, K.
Abstract: Brain Tumor Segmentation (BTS) and classification are important and growing research fields. Magnetic resonance imaging (MRI) is commonly used in the diagnosis of brain tumours owing to its low radiation exposure and high image quality. One of the current subjects in the field of medical imaging is how to quickly and precisely segment MRI scans of brain tumours. Unfortunately, most existing brain tumour segmentation algorithms use inadequate 2D picture segmentation methods and fail to capture the spatial correlation between features. In this study, we propose a segmentation model (SwinVNETR) Swin V-NetTRansformer-based architecture with a non-local block. This model was trained using the Brain Tumor Segmentation Challenge BraTS 2021 dataset. The Dice similarity coefficients for the enhanced tumour (ET), whole tumour (WT), and tumour core (TC) are 0.84, 0.91, and 0.87, respectively. By leveraging this methodology, we can segment brain tumours more accurately than ever before. In conclusion, we present the findings of our model through the application of the Grad-CAM methodology, an eXplainable Artificial Intelligence (XAI) technique utilized to elucidate the insights derived from the model, which helped in better understanding; doctors can better diagnose and treat patients with brain tumours. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Progressive Transmission Line Image Transmission and Recovery Algorithm Based on Hybrid Attention and Feature Fusion for Signal-Free Regions of Transmission Lines.

Author: Ji, Xiu, Yang, Xiao, Yue, Zheyu, Yang, Hongliu, and Guo, Haiyang
Abstract: In this paper, a progressive image transmission and recovery algorithm based on hybrid attention mechanism and feature fusion is proposed, aiming to solve the challenge of monitoring the signal-less region of transmission lines. The method combines wavelet transform, Swin Transformer, and hybrid attention module with the Pixel Shuffle upsampling mechanism to achieve a balance between quality and efficiency of image transmission in a low bandwidth environment. Initial preview is achieved by prioritizing the transmission of low-frequency subbands through wavelet transform, followed by dynamic optimization of the weight allocation of key features using a hybrid attention and local window multi-scale self-attention mechanism, and further enhancement of the resolution of the decoded image through Pixel Shuffle upsampling. Experimental results show that the algorithm significantly outperforms existing methods in terms of image quality (PSNR, SSIM), transmission efficiency, and bandwidth utilization, proving its superior adaptability and effectiveness in surveillance scenarios in signal-free regions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. 基于多尺度特征交叉融合注意力的滚动轴承故障诊断方法.

Author: 刘振华, 吴磊, and 张康生
Abstract: Copyright of Bearing is the property of Bearing Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

12. A Method for Detecting Tomato Maturity Based on Deep Learning.

Author: Wang, Song, Xiang, Jianxia, Chen, Daqing, and Zhang, Cong
Abstract: In complex scenes, factors such as tree branches and leaves occlusion, dense distribution of tomato fruits, and similarity of fruit color to the background color make it difficult to correctly identify the ripeness of the tomato fruits when harvesting them. Therefore, in this study, an improved YOLOv8 algorithm is proposed to address the problem of tomato fruit ripeness detection in complex scenarios, which is difficult to carry out accurately. The algorithm employs several technical means to improve detection accuracy and efficiency. First, Swin Transformer is used to replace the third C2f in the backbone part. The modeling of global and local information is realized through the self-attention mechanism, which improves the generalization ability and feature extraction ability of the model, thereby bringing higher detection accuracy. Secondly, the C2f convolution in the neck section is replaced with Distribution Shifting Convolution, so that the model can better process spatial information and further improve the object detection accuracy. In addition, by replacing the original CIOU loss function with the Focal–EIOU loss function, the problem of sample imbalance is solved and the detection performance of the model in complex scenarios is improved. After improvement, the mAP of the model increased by 2.3%, and the Recall increased by 6.8% on the basis of YOLOv8s, and the final mAP and Recall reached 86.9% and 82.0%, respectively. The detection speed of the improved model reaches 190.34 FPS, which meets the demand of real-time detection. The results show that the improved YOLOv8 algorithm proposed in this study exhibits excellent performance in the task of tomato ripeness detection in complex scenarios, providing important experience and guidance for tomato ripeness detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Occlusion Vehicle Target Recognition Method Based on Component Model.

Author: Han, Haorui and Li, Hanshan
Abstract: As an important part of intelligent traffic, vehicle recognition plays an irreplaceable role in traffic management. Due to the complexity and occlusion of various objects in the traffic scene, the accuracy of vehicle target recognition is poor. Therefore, based on the distribution features of vehicle components, this paper proposes a two-stage VSRS-VCFM net occlusion vehicle target recognition method. Based on the U-net codec structure, combining multi-scale detection and double constraints loss to improve the visual region segmentation under complex background (VSRS) performance. At the same time, to establish the vehicle component feature mask (VCFM) module, based on the Swin Transformer backbone unit, combined with the component perception enhancement unit and the efficient attention unit, the extraction of the low-contrast component area of the vehicle target and the filtering of the irrelevant area are realized. Then, the component mask recognition unit is introduced to remove the occlusion component feature area and realize the accurate recognition of the occluded vehicle. By labeling the public data set and the collected data set, six types of vehicle component data sets are constructed for training, as well as design ablation experiments and comparison experiments to verify the trained network, which prove the superiority of the recognition algorithm. The experimental results show that the proposed recognition method effectively solves the problem of misrecognition and missing recognition caused by interference and occlusion in vehicle recognition. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. EDSD: efficient driving scenes detection based on Swin Transformer.

Author: Chen, Wei, Zheng, Ruihan, Jiang, Jiade, Tian, Zijian, Zhang, Fan, and Liu, Yi
Subjects: CONVOLUTIONAL neural networks, TRANSFORMER models, OBJECT recognition (Computer vision), PEDESTRIANS, DETECTORS
Abstract: In the field of autonomous driving, the detection of targets such as vehicles, bicycles, and pedestrians in complex road conditions is of great importance. Through extensive experimentation, we have found that various vehicle targets generally occupy large sizes in the image but are easily occluded, while small targets such as pedestrians usually appear densely. The detection of targets of different sizes is an important challenge for the performance of current detectors. To address this issue, we proposed a novel hierarchical feature pyramid network structure. This structure comprises a series of CNN-Transformer variant layers, each of which is a superposition of CST neural network modules and Swin Transformer modules. In addition, considering that the huge computation of the global self-attention mechanism is difficult to be applied in the field of autonomous driving, we adopted the shifted window method in SwinFM, which effectively accelerates the inference process by replacing the traditional method by using the self-attention mechanism within the window. This study uses the Swin Transformer as a baseline. Compared to the baseline, our EDSD model improves the average accuracy by 1.8% and 3.1% on the BDD100K dataset and the KITTI dataset, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. SwinSight: a hierarchical vision transformer using shifted windows to leverage aerial image classification.

Author: Pradhan, Praveen Kumar, Das, Alloy, Kumar, Amish, Baruah, Udayan, Sen, Biswaraj, and Ghosal, Palash
Subjects: IMAGE recognition (Computer vision), OBJECT recognition (Computer vision), TRANSFORMER models, CONVOLUTIONAL neural networks, DISCRETE cosine transforms
Abstract: In aerial image classification, integrating advanced vision transformers with optimal preprocessing techniques is pivotal for enhancing model performance. This study presents SwinSight, a novel hierarchical vision transformer optimized for aerial image classification, which effectively addresses the computational challenges typically associated with transformers through a shifted window mechanism. The core of the research focuses on enhancing model performance by integrating a systematic preprocessing approach using Discrete Cosine Transform (DCT), Discrete Wavelet Transform (DWT), and Fast Fourier Transform (FFT). An extensive ablation study evaluates six permutations of these techniques, aiming to identify the most effective sequence for preprocessing. Results indicate that the sequence of DCT, followed by DWT, then FFT, significantly excels, achieving a high classification accuracy of 93.16% and maintaining a rapid inference time of 0.0049 seconds per frame. This sequence's superior performance highlights the critical role of preprocessing order in optimizing feature extraction, thereby boosting the efficacy of the classification process. SwinSight's advancements not only set a new benchmark for aerial image analysis but also offer broader implications for enhancing image processing workflows in various applications, contributing to theoretical insights and practical improvements in image-based machine learning tasks. This paper not only offers a practical solution for aerial image classification for diverse applications such as agriculture, environmental monitoring, land use applications, security, and beyond but also presents a novel SAIOD (Sikkim Aerial Images dataset for Object Detection) to the computer vision research community, fostering added advancements. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. Swin-Caption: Swin Transformer-Based Image Captioning with Feature Enhancement and Multi-Stage Fusion.

Author: liu, Lei, Jiao, Yidi, Li, Xiaoran, Li, Jing, Wang, Haitao, and Cao, Xinyu
Subjects: *TRANSFORMER models, *DEEP learning, *IMAGE fusion, *COMPUTERS
Abstract: The objective of image captioning is to empower computers to generate human-like sentences autonomously, describing a provided image. To tackle the challenges of insufficient accuracy in image feature extraction and underutilization of visual information, we present a Swin Transformer-based model for image captioning with feature enhancement and multi-stage fusion (Swin-Caption). Initially, the Swin Transformer is employed in the capacity of an encoder for extracting images, while feature enhancement is adopted to gather additional image feature information. Subsequently, a multi-stage image and semantic fusion module is constructed to utilize the semantic information from past time steps. Lastly, a two-layer LSTM is utilized to decode semantic and image data, generating captions. The proposed model outperforms the baseline model in experimental tests and instance analysis on the public datasets Flickr8K, Flickr30K, and MS-COCO. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Kangba Region of Sichuan based on swin transformer visual model research on the identification of facades of ethnic buildings.

Author: Zhang, Yan, Wang, Boyuan, and Li, Jimei
Subjects: *ARTIFICIAL neural networks, *TRANSFORMER models, *ARCHITECTURAL style, *DEEP learning, *PRESERVATION of architecture
Abstract: The protection and restoration of existing buildings requires accurate acquisition of the characteristics of the building facade. The complex, diverse, and irregular distribution characteristics of the building facade components of ethnic minorities have led to a huge workload of field research, surveying, mapping, and calculation, and it is more difficult to extract its facade characteristics accurately. This study proposes a visual model based on the Swin Transformer and applies it to the graphic recognition of ethnic building elevations. The model combines the advantages of the migration learning method and deep neural network technology and is further enriched by layer normalization to improve the stability and extraction ability of model training. In the field survey of ethnic minority buildings in Kangba, Sichuan, 1100 images of local buildings were collected, including 8 different types of ethnic minority buildings. The experimental results show that compared with other mainstream deep neural network models, the Swin Transformer visual model shows excellent predictive performance to prove the effectiveness of the proposed method. This study also uses the t-sne dimension reduction method to verify the feature extraction ability of the Swin Transformer, which contributes to the protection and restoration of ethnic minority buildings, active exploration of energy conservation, digital archiving, and more. Provide theoretical and practical reference in the fields of architectural style and cultural research. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Enhanced Magnetic Resonance Imaging-Based Brain Tumor Classification with a Hybrid Swin Transformer and ResNet50V2 Model.

Author: Al Bataineh, Abeer Fayez, Nahar, Khalid M. O., Khafajeh, Hayel, Samara, Ghassan, Alazaidah, Raed, Nasayreh, Ahmad, Bashkami, Ayah, Gharaibeh, Hasan, and Dawaghreh, Waed
Subjects: TRANSFORMER models, CONVOLUTIONAL neural networks, DEEP learning, DATA augmentation, BRAIN tumors
Abstract: Brain tumors can be serious; consequently, rapid and accurate detection is crucial. Nevertheless, a variety of obstacles, such as poor imaging resolution, doubts over the accuracy of data, a lack of diverse tumor classes and stages, and the possibility of misunderstanding, present challenges to achieve an accurate and final diagnosis. Effective brain cancer detection is crucial for patients' safety and health. Deep learning systems provide the capability to assist radiologists in quickly and accurately detecting diagnoses. This study presents an innovative deep learning approach that utilizes the Swin Transformer. The suggested method entails integrating the Swin Transformer with the pretrained deep learning model Resnet50V2, called (SwT+Resnet50V2). The objective of this modification is to decrease memory utilization, enhance classification accuracy, and reduce training complexity. The self-attention mechanism of the Swin Transformer identifies distant relationships and captures the overall context. Resnet 50V2 improves both accuracy and training speed by extracting adaptive features from the Swin Transformer's dependencies. We evaluate the proposed framework using two publicly accessible brain magnetic resonance imaging (MRI) datasets, each including two and four distinct classes, respectively. Employing data augmentation and transfer learning techniques enhances model performance, leading to more dependable and cost-effective training. The suggested model achieves an impressive accuracy of 99.9% on the binary-labeled dataset and 96.8% on the four-labeled dataset, outperforming the VGG16, MobileNetV2, Resnet50V2, EfficientNetV2B3, ConvNeXtTiny, and convolutional neural network (CNN) algorithms used for comparison. This demonstrates that the Swin transducer, when combined with Resnet50V2, is capable of accurately diagnosing brain tumors. This method leverages the combination of SwT+Resnet50V2 to create an innovative diagnostic tool. Radiologists have the potential to accelerate and improve the detection of brain tumors, leading to improved patient outcomes and reduced risks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. SG-LPR: Semantic-Guided LiDAR-Based Place Recognition.

Author: Jiang, Weizhong, Xue, Hanzhang, Si, Shubin, Min, Chen, Xiao, Liang, Nie, Yiming, and Dai, Bin
Subjects: TRANSFORMER models, LIDAR, GENERALIZATION, ROBOTICS, STORAGE
Abstract: Place recognition plays a crucial role in tasks such as loop closure detection and re-localization in robotic navigation. As a high-level representation within scenes, semantics enables models to effectively distinguish geometrically similar places, therefore enhancing their robustness to environmental changes. Unlike most existing semantic-based LiDAR place recognition (LPR) methods that adopt a multi-stage and relatively segregated data-processing and storage pipeline, we propose a novel end-to-end LPR model guided by semantic information—SG-LPR. This model introduces a semantic segmentation auxiliary task to guide the model in autonomously capturing high-level semantic information from the scene, implicitly integrating these features into the main LPR task, thus providing a unified framework of "segmentation-while-describing" and avoiding additional intermediate data-processing and storage steps. Moreover, the semantic segmentation auxiliary task operates only during model training, therefore not adding any time overhead during the testing phase. The model also combines the advantages of Swin Transformer and U-Net to address the shortcomings of current semantic-based LPR methods in capturing global contextual information and extracting fine-grained features. Extensive experiments conducted on multiple sequences from the KITTI and NCLT datasets validate the effectiveness, robustness, and generalization ability of our proposed method. Our approach achieves notable performance improvements over state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. An improved YOLOv7 model based on Swin Transformer and Trident Pyramid Networks for accurate tomato detection.

Author: Liu, Guoxu, Zhang, Yonghui, Liu, Jun, Liu, Deyong, Chen, Chunlei, Li, Yujie, Zhang, Xiujie, and Touko Mbouembe, Philippe Lyonel
Subjects: TRANSFORMER models, FRUIT harvesting, FRUIT, PYRAMIDS, COMMERCIALIZATION
Abstract: Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F1 score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Research on Soybean Seedling Stage Recognition Based on Swin Transformer.

Author: Ma, Kai, Qiu, Jinkai, Kang, Ye, Qi, Liqiang, Zhang, Wei, Wang, Song, and Xu, Xiuying
Subjects: *TRANSFORMER models, *SHOOTING (Sports), *SHOOTING equipment, *IMAGE intensifiers, *WEATHER
Abstract: Accurate identification of the second and third compound leaf periods of soybean seedlings is a prerequisite to ensure that soybeans are chemically weeded after seedling at the optimal application period. Accurate identification of the soybean seedling period is susceptible to natural light and complex field background factors. A transfer learning-based Swin-T (Swin Transformer) network is proposed to recognize different stages of the soybean seedling stage. A drone was used to collect images of soybeans at the true leaf stage, the first compound leaf stage, the second compound leaf stage, and the third compound leaf stage, and data enhancement methods such as image rotation and brightness enhancement were used to expand the dataset, simulate the drone's collection of images at different shooting angles and weather conditions, and enhance the adaptability of the model. The field environment and shooting equipment directly affect the quality of the captured images, and in order to test the anti-interference ability of different models, the Gaussian blur method was used to blur the images of the test set to different degrees. The Swin-T model was optimized by introducing transfer learning and combining hyperparameter combination experiments and optimizer selection experiments. The performance of the optimized Swin-T model was compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models. The results show that the optimized Swin-T model has an average accuracy of 98.38% in the test set, which is an improvement of 11.25%, 12.62%, 10.75%, 1.00%, and 0.63% compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, respectively. The optimized Swin-T model is best in terms of recall and F1 score. In the performance degradation test of the motion blur level model, the maximum degradation accuracy, overall degradation index, and average degradation index of the optimized Swin-T model were 87.77%, 6.54%, and 2.18%, respectively. The maximum degradation accuracy was 7.02%, 7.48%, 10.15%, 3.56%, and 2.5% higher than the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, respectively. In the performance degradation test of the Gaussian fuzzy level models, the maximum degradation accuracy, overall degradation index, and average degradation index of the optimized Swin-T model were 94.3%, 3.85%, and 1.285%, respectively. Compared with the MobileNetV2, ResNet50, AlexNet, GoogleNet, and VGG16Net models, the maximum degradation accuracy was 12.13%, 15.98%, 16.7%, 2.2%, and 1.5% higher, respectively. Taking into account various degradation indicators, the Swin-T model can still maintain high recognition accuracy and demonstrate good anti-interference ability even when inputting blurry images caused by interference in shooting. It can meet the recognition of different growth stages of soybean seedlings in complex environments, providing a basis for post-seedling chemical weed control during the second and third compound leaf stages of soybeans. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. MPCTrans: Multi-Perspective Cue-Aware Joint Relationship Representation for 3D Hand Pose Estimation via Swin Transformer.

Author: Wan, Xiangan, Ju, Jianping, Tang, Jianying, Lin, Mingyu, Rao, Ning, Chen, Deng, Liu, Tingting, Li, Jing, Bian, Fan, and Xiong, Nicholas
Subjects: *TRANSFORMER models, *DEEP learning, *JOINTS (Anatomy), *FEATURE extraction, *SPINE, *HAND
Abstract: The objective of 3D hand pose estimation (HPE) based on depth images is to accurately locate and predict keypoints of the hand. However, this task remains challenging because of the variations in hand appearance from different viewpoints and severe occlusions. To effectively address these challenges, this study introduces a novel approach, called the multi-perspective cue-aware joint relationship representation for 3D HPE via the Swin Transformer (MPCTrans, for short). This approach is designed to learn multi-perspective cues and essential information from hand depth images. To achieve this goal, three novel modules are proposed to utilize features from multiple virtual views of the hand, namely, the adaptive virtual multi-viewpoint (AVM), hierarchy feature estimation (HFE), and virtual viewpoint evaluation (VVE) modules. The AVM module adaptively adjusts the angles of the virtual viewpoint and learns the ideal virtual viewpoint to generate informative multiple virtual views. The HFE module estimates hand keypoints through hierarchical feature extraction. The VVE module evaluates virtual viewpoints by using chained high-level functions from the HFE module. Transformer is used as a backbone to extract the long-range semantic joint relationships in hand depth images. Extensive experiments demonstrate that the MPCTrans model achieves state-of-the-art performance on four challenging benchmark datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. FCSwinU: Fourier Convolutions and Swin Transformer UNet for Hyperspectral and Multispectral Image Fusion.

Author: Li, Rumei, Zhang, Liyan, Wang, Zun, and Li, Xiaojuan
Subjects: *TRANSFORMER models, *IMAGE fusion, *CONVOLUTIONAL neural networks, *DEEP learning, *FEATURE extraction, *MULTISPECTRAL imaging
Abstract: The fusion of low-resolution hyperspectral images (LR-HSI) with high-resolution multispectral images (HR-MSI) provides a cost-effective approach to obtaining high-resolution hyperspectral images (HR-HSI). Existing methods primarily based on convolutional neural networks (CNNs) struggle to capture global features and do not adequately address the significant scale and spectral resolution differences between LR-HSI and HR-MSI. To tackle these challenges, our novel FCSwinU network leverages the spectral fast Fourier convolution (SFFC) module for spectral feature extraction and utilizes the Swin Transformer's self-attention mechanism for multi-scale global feature fusion. FCSwinU employs a UNet-like encoder–decoder framework to effectively merge spatiospectral features. The encoder integrates the Swin Transformer feature abstraction module (SwinTFAM) to encode pixel correlations and perform multi-scale transformations, facilitating the adaptive fusion of hyperspectral and multispectral data. The decoder then employs the Swin Transformer feature reconstruction module (SwinTFRM) to reconstruct the fused features, restoring the original image dimensions and ensuring the precise recovery of spatial and spectral details. Experimental results from three benchmark datasets and a real-world dataset robustly validate the superior performance of our method in both visual representation and quantitative assessment compared to existing fusion methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. A two-stage model for spatial downscaling of daily precipitation data.

Author: Lei, Weihao, Qin, Huawang, Hou, Xiaoyang, and Chen, Haoran
Subjects: *CONVOLUTIONAL neural networks, *MACHINE learning, *DOWNSCALING (Climatology), *CLIMATE change models, *TRANSFORMER models
Abstract: Providing reliable and accurate high-resolution meteorological data is very significant to guide the rapid response to extreme weather conditions. However, due to the constraints of computing power and simulation time, the spatial resolution of existing all global climate models is low, and it is unable to provide meteorological data with more precise resolution at local scale. In this research, a deep learning downscaling model called two-stage multi-scale feature extraction network (TSMFN) is proposed. By combining ERA5 reanalysis data and terrain data, spatial downscaling of global precipitation measurement mission precipitation data is carried out. Specifically, in the first stage of the network, several multi-scale residual Inception blocks are used to extract multi-scale features of low-resolution precipitation data; several residual-based residual multi-scale cross blocks are used to fully excavate multi-scale features after the fusion of multiple data. In the second stage of the TSMFN, the output feature map after the fusion of the previous stage is fused with the high-resolution monthly average precipitation data, and several progressive multistage Swin Transformer blocks are constructed to overcome the problems that the reconstructed image of a general convolutional neural network is smooth and cannot reflect the real spatial distribution of precipitation. Finally, a hybrid loss function combining L 1 loss function and focal frequency loss function is proposed to alleviate the ill-posedness of the downscaling. By comparing the proposed algorithm with some advanced deep learning downscaling algorithms, the results of the experiment show that the TSMFN model is significantly better in terms of generated image quality and multiple evaluation indexes, and the model has stronger generalization ability. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Deep learning-based bubble detection with swin transformer.

Author: Uesawa, Shinichiro and Yoshida, Hiroyuki
Abstract: We developed a deep learning-based bubble detector with a Shifted window Transformer (Swin Transformer) to detect and segment individual bubbles among overlapping bubbles. To verify the performance of the detector, we calculated its average precision (AP) with different number of training images. The mask AP increased with the increase in the number of training images when there were less than 50 images. It was observed that the AP for the Swin Transformer and ResNet were almost the same when there were more than 50 images; however, when few training images were used, the AP of the Swin Transformer were higher than that of the ResNet. Furthermore, for the increase in void fraction, the AP of the Swin Transformer showed a decrease similar to that in the case of the ResNet; however, for few training images, the AP of the Swin Transformer was higher than that of the ResNet in all void fractions. Moreover, we confirmed the detector trained with experimental and synthetic bubble images was able to segment overlapping bubbles and deformed bubbles in a bubbly flow experiment. Thus, we verified that the new bubble detector with Swin Transformer provided higher AP than the detector with ResNet for fewer training images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Integrating Swin Transformer with Fuzzy Gray Wolve Optimization for MRI Brain Tumor Classification.

Author: Katran, L. Fahem, AlShemmary, E. N., and Al-Jawher, W. A. M.
Subjects: TRANSFORMER models, GREY Wolf Optimizer algorithm, FEATURE selection, OBJECT recognition (Computer vision), MAGNETIC resonance imaging
Abstract: The diagnosis is influenced by the classification of brain MRIs. Classifying and analyzing structures within images can be significantly enhanced by employing the Swin Transformer. The Swin Transformer is capable of capturing long-range relationships between pixels and creating layered image representations, which enhances its capacity to accurately evaluate brain structures. In addition, it has been demonstrated to be highly efficient in terms of computational power and memory usage, rendering it an optimal choice for high-resolution image processes. It is a tool that is adaptable in the field of computer vision, as it can be used for tasks such as object detection and image segmentation, demonstrating its versatility. The objective of this research is to identify tumors by classifying brain MRI image. In the event that tumors are identified, the research classifies them into three types: glioma, pituitary, and meningioma. The model utilizes 30,405 MRI images collected from two sources: Kaggle and the Al-Razi Medical Center in Iraq. The Swin Transformer's performance is enhanced by the integration of the feature selection mechanism with Gray Wolf Optimizer (GWO). The reduction in the number of features used during training, as a result of feature selection facilitated by GWO, leads to improved model efficiency and reduced data processing requirements. This simplified approach accelerates model performance by improving efficiency and memory usage. Additionally, optimal feature selection eliminates features that improve the models' accuracy and their capacity to differentiate between classes in MRI images. Incorporating the Fuzzy C Means (FCM) technique into feature selection may additionally improve the performance of GWO. The FCM assists in the grouping of related features to facilitate the selection of features by GWO. This method enhances the discrimination of non-useful features, resulting in reduced feature conflicts and improved model accuracy. Additionally, FCM assists in the identification of cluster centers that direct the GWO during optimization, thereby reducing the necessity for feature evaluation and improving processing efficiency. The optimization process is also expedited by integration. Improves the capacity of models to generalize in order to reduce the risk of overfitting and guarantee stability across datasets. This method enhances diagnostic accuracy by collecting features and analyzing structures in brain images, thereby providing a more profound understanding of the data. The proposed model achieved an accuracy of 99.490% with a loss rate of 0.0222 when trained on fuzzy optimization features. For training on fuzzy features derived from fuzzy wavelet MRI images, the model reached an accuracy of 99.572% and maintained a loss rate of 0.0206. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. PolySegNet: improving polyp segmentation through swin transformer and vision transformer fusion.

Author: Lijin, P., Ullah, Mohib, Vats, Anuja, Cheikh, Faouzi Alaya, Santhosh Kumar, G., and Nair, Madhu S.
Abstract: Colorectal cancer ranks as the second most prevalent cancer worldwide, with a high mortality rate. Colonoscopy stands as the preferred procedure for diagnosing colorectal cancer. Detecting polyps at an early stage is critical for effective prevention and diagnosis. However, challenges in colonoscopic procedures often lead medical practitioners to seek support from alternative techniques for timely polyp identification. Polyp segmentation emerges as a promising approach to identify polyps in colonoscopy images. In this paper, we propose an advanced method, PolySegNet, that leverages both Vision Transformer and Swin Transformer, coupled with a Convolutional Neural Network (CNN) decoder. The fusion of these models facilitates a comprehensive analysis of various modules in our proposed architecture.To assess the performance of PolySegNet, we evaluate it on three colonoscopy datasets, a combined dataset, and their augmented versions. The experimental results demonstrate that PolySegNet achieves competitive results in terms of polyp segmentation accuracy and efficacy, achieving a mean Dice score of 0.92 and a mean Intersection over Union (IoU) of 0.86. These metrics highlight the superior performance of PolySegNet in accurately delineating polyp boundaries compared to existing methods. PolySegNet has shown great promise in accurately and efficiently segmenting polyps in medical images. The proposed method could be the foundation for a new class of transformer-based segmentation models in medical image analysis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Crowd behavior detection: leveraging video swin transformer for crowd size and violence level analysis.

Author: Qaraqe, Marwa, Yang, Yin David, Varghese, Elizabeth B, Basaran, Emrah, and Elzein, Almiqdad
Subjects: COLLECTIVE behavior, TRANSFORMER models, SOFTWARE development tools, STREAMING video & television, CLOSED-circuit television
Abstract: In recent years, crowd behavior detection has posed significant challenges in the realm of public safety and security, even with the advancements in surveillance technologies. The ability to perform real-time surveillance and accurately identify crowd behavior by considering factors such as crowd size and violence levels can avert potential crowd-related disasters and hazards to a considerable extent. However, most existing approaches are not viable to deal with the complexities of crowd dynamics and fail to distinguish different violence levels within crowds. Moreover, the prevailing approach to crowd behavior recognition, which solely relies on the analysis of closed-circuit television (CCTV) footage and overlooks the integration of online social media video content, leads to a primarily reactive methodology. This paper proposes a crowd behavior detection framework based on the swin transformer architecture, which leverages crowd counting maps and optical flow maps to detect crowd behavior across various sizes and violence levels. To support this framework, we created a dataset comprising videos capable of recognizing crowd behaviors based on size and violence levels sourced from CCTV camera footage and online videos. Experimental analysis conducted on benchmark datasets and our proposed dataset substantiates the superiority of our proposed approach over existing state-of-the-art methods, showcasing its ability to effectively distinguish crowd behaviors concerning size and violence level. Our method's validation through Nvidia's DeepStream Software Development Kit (SDK) highlights its competitive performance and potential for real-time intelligent surveillance applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. 面向遥感图像检索的自适应样本类型判别研究.

Author: 邵徽虎, 葛芸, 熊俊杰, and 余洁洁
Abstract: Copyright of Journal of Frontiers of Computer Science & Technology is the property of Beijing Journal of Computer Engineering & Applications Journal Co Ltd. and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

30. TSRDet: A Table Structure Recognition Method Based on Row-Column Detection.

Author: Zhu, Zixuan, Li, Weibin, Yu, Chenglong, Li, Wei, and Jiao, Licheng
Subjects: TRANSFORMER models
Abstract: As one of the most commonly used and important data carriers, tables have the advantages of high structuring, strong readability and strong flexibility. However, in reality, tables usually present various forms, such as Excel, images, etc. Among them, the information in the table image cannot be read directly, let alone further applied. Therefore, the research related to image-based table recognition is crucial. It contains the table structure recognition and the table content recognition. Among them, table structure recognition is the most important and difficult task because the table structure is abstract and changeable. In order to address this problem, we propose an innovative table structure recognition method, named TSRDet (Table Structure Recognition based on object Detection). It includes a row-column detection method, named SACNet (StripAttention-CenterNet) and the corresponding post-processing. SACNet is an improved version of the original CenterNet. The specific improvements include the following: firstly, we introduce the Swin Transformer as the encoder to obtain the global feature map of the image. Then, we propose a plug-and-play row-column attention module, including a channel attention module and a row-column spatial attention module. It improves the detection accuracy of rows and columns by capturing long-range row-column feature maps in the image. After completing the row-column detection, this paper also designs a simple and fast post-processing to generate the table structure based on the row-column detection results. Experimental results show that for row-column detection, SACNet has high detection accuracy, even at a high IoU threshold. Specifically, when the threshold is 0.75, its mAP of row detection and column detection still exceeds 90%, which is 91.40% and 92.73% respectively. In addition, in the comparative experiment with the existing object detection methods, SACNet's performance was significantly better than that of all others. For table structure recognition, the TEDS-Struct score of TSRDet is 95.7%, which shows competitive performance in table structure recognition, and verifies the rationality and superiority of the proposed method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. A Swin Transformer-based method for Classification of Land Use and Land Cover Images.

Author: Mohammed, Enas Ali and Lakizadeh, Amir
Subjects: IMAGE recognition (Computer vision), TRANSFORMER models, ZONING, COMPUTER vision, LAND cover
Abstract: Satellite image classification plays a crucial role in land use analysis, environmental monitoring, and urban planning. Recent developments in computer vision have led to the development of algorithms for image classification that are becoming increasingly successful. These techniques are known as vision transformer. On the other hand, it is often important to overcome problems related with limited receptive fields and the need for complete training data if one wants optimum performance. This work aims to provide a fresh approach for enhancing the design of the Swin transformer thus improving the classification of land use and land cover on the Eurosat dataset. Depth-wise Separable Convolutional Multi head Self-attention (DWSC-MSA) methods are suggested to be included into Swin transformer blocks. This entails changing the Shifted Window Multi-Head Self-Attention (SW-MSA) in the decoder and encoder blocks respectively. The DWSCMSA method enables the extraction and prioritizing of specific features, resulting in enhanced classification performance. We performed experiments on the Eurosat dataset using many additional commonly used transformers, including swin-tiny, swin-small, swin-base, crossvit, and convit. The experimental results showcase the efficacy of our suggested framework in capturing spatial relationships and improving feature representation, thus pushing the boundaries of land use and land cover classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Predicting USCS soil texture classes utilizing soil spectra and deep learning.

Author: Kasaragod, Anush Kumar, Thomas, Jobin, Oommen, Thomas, Williams, Ryan, Paheding, Sidike, Angulo, Abel Reyes, Ewing, Jordan, Cole, Michael, and Jayakumar, Paramsothy
Subjects: CONVOLUTIONAL neural networks, TRANSFORMER models, MACHINE learning, SOIL texture, SOIL classification
Abstract: Purpose: Soil texture identification is vital for various agricultural and engineering applications but generally involves rigorous laboratory work, especially for estimating USCS (Unified Soil Classification System) soil texture classes. Soil texture influences soil water storage capacity, soil fertility, compaction characteristics, and soil strength. Soil spectroscopy offers a reliable approach that is non-destructive, rapid, and cost-effective to estimate several soil properties including texture. For engineering applications, the USCS soil texture classes are preferred, but very few studies have focussed on estimating USCS soil texture using soil spectroscopy or remote sensing data in general. Methods: Two large soil spectral libraries (SSLs), viz., Kellog Soil Spectral Library (KSSL) and Open-source Soil Spectral Library (OSSL), as well as three deep learning algorithms (VGG-16, ResNet-16, and Swin transformers), were used in this study to predict six USCS soil texture classes and three USCS soil texture groups. The USCS soil texture classes and groups were derived by grouping clay, sand, and silt fractions that are closely associated with the corresponding USCS soil texture classes. Results: The results indicate that the Swin transformer model performed the best with an accuracy of 67% for six USCS soil texture class predictions and 81% for three USCS soil texture group predictions. Cohen's kappa value implies a moderate agreement (0.55) for soil texture class predictions and a substantial agreement (0.64) for soil texture group predictions. Conclusion: The proposed methodology offers a novel approach for USCS soil texture class predictions utilizing SSLs and deep learning techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Sports-ACtrans Net: research on multimodal robotic sports action recognition driven via ST-GCN.

Author: Qi Lu
Subjects: TRANSFORMER models, COMPUTER vision, ROBOT vision, ROBOTICS, COMPUTATIONAL neuroscience, ROBOTS, REINFORCEMENT learning
Abstract: Introduction: Accurately recognizing and understanding human motion actions presents a key challenge in the development of intelligent sports robots. Traditional methods often encounter significant drawbacks, such as high computational resource requirements and suboptimal real-time performance. To address these limitations, this study proposes a novel approach called Sports-ACtrans Net. Methods: In this approach, the Swin Transformer processes visual data to extract spatial features, while the Spatio-Temporal Graph Convolutional Network (ST-GCN) models human motion as graphs to handle skeleton data. By combining these outputs, a comprehensive representation of motion actions is created. Reinforcement learning is employed to optimize the action recognition process, framing it as a sequential decision-making problem. Deep Q-learning is utilized to learn the optimal policy, thereby enhancing the robot's ability to accurately recognize and engage in motion. Results and discussion: Experiments demonstrate significant improvements over state-of-the-art methods. This research advances the fields of neural computation, computer vision, and neuroscience, aiding in the development of intelligent robotic systems capable of understanding and participating in sports activities. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Swin Transformer-based automatic delineation of the hippocampus by MRI in hippocampus-sparing whole-brain radiotherapy.

Author: Liang Li, Zhennan Lu, Aijun Jiang, Guanchen Sha, Zhaoyang Luo, Xin Xie, and Xin Ding
Subjects: TRANSFORMER models, HIPPOCAMPUS (Brain), MEDICAL dosimetry, MAGNETIC resonance imaging, MEDICAL personnel
Abstract: Objective: This study aims to develop and validate SwinHS, a deep learning-based automatic segmentation model designed for precise hippocampus delineation in patients receiving hippocampus-protected whole-brain radiotherapy. By streamlining this process, we seek to significantly improve workflow efficiency for clinicians. Methods: A total of 100 three-dimensional T1-weighted MR images were collected, with 70 patients allocated for training and 30 for testing. Manual delineation of the hippocampus was performed according to RTOG0933 guidelines. The SwinHS model, which incorporates a 3D ELSA Transformer module and an sSE CNN decoder, was trained and tested on these datasets. To prove the effectiveness of SwinHS, this study compared the segmentation performance of SwinHS with that of V-Net, U-Net, ResNet and VIT. Evaluation metrics included the Dice similarity coefficient (DSC), Jaccard similarity coefficient (JSC), and Hausdorff distance (HD). Dosimetric evaluation compared radiotherapy plans generated using automatic segmentation (plan AD) versus manual hippocampus segmentation (plan MD). Results: SwinHS outperformed four advanced deep learning-based models, achieving an average DSC of 0.894, a JSC of 0.817, and an HD of 3.430 mm. Dosimetric evaluation revealed that both plan (AD) and plan (MD) met treatment plan constraints for the target volume (PTV). However, the hippocampal Dmax in plan (AD) was significantly greater than that in plan (MD), approaching the 17 Gy constraint limit. Nonetheless, there were no significant differences in D100% or maximum doses to other critical structures between the two plans. Conclusion: Compared with manual delineation, SwinHS demonstrated superior segmentation performance and a significantly shorter delineation time. While plan (AD) met clinical requirements, caution should be exercised regarding hippocampal Dmax. SwinHS offers a promising tool to enhance workflow efficiency and facilitate hippocampal protection in radiotherapy planning for patients with brain metastases. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Multi-label dental disorder diagnosis based on MobileNetV2 and swin transformer using bagging ensemble classifier.

Author: Alsakar, Yasmin M., Elazab, Naira, Nader, Nermeen, Mohamed, Waleed, Ezzat, Mohamed, and Elmogy, Mohammed
Subjects: *TRANSFORMER models, *DENTAL radiography, *MACHINE learning, *X-ray imaging, *FEATURE extraction
Abstract: Dental disorders are common worldwide, causing pain or infections and limiting mouth opening, so dental conditions impact productivity, work capability, and quality of life. Manual detection and classification of oral diseases is time-consuming and requires dentists' evaluation and examination. The dental disease detection and classification system based on machine learning and deep learning will aid in early dental disease diagnosis. Hence, this paper proposes a new diagnosis system for dental diseases using X-ray imaging. The framework includes a robust pre-processing phase that uses image normalization and adaptive histogram equalization to improve image quality and reduce variation. A dual-stream approach is used for feature extraction, utilizing the advantages of Swin Transformer for capturing long-range dependencies and global context and MobileNetV2 for effective local feature extraction. A thorough representation of dental anomalies is produced by fusing the extracted features. To obtain reliable and broadly applicable classification results, a bagging ensemble classifier is utilized in the end. We evaluate our model on a benchmark dental radiography dataset. The experimental results and comparisons show the superiority of the proposed system with 95.7% for precision, 95.4% for sensitivity, 95.7% for specificity, 95.5% for Dice similarity coefficient, and 95.6% for accuracy. The results demonstrate the effectiveness of our hybrid model integrating MoileNetv2 and Swin Transformer architectures, outperforming state-of-the-art techniques in classifying dental diseases using dental panoramic X-ray imaging. This framework presents a promising method for robustly and accurately diagnosing dental diseases automatically, which may help dentists plan treatments and identify dental diseases early on. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. Early Cervical Cancer Diagnosis with SWIN-Transformer and Convolutional Neural Networks.

Author: Mohammed, Foziya Ahmed, Tune, Kula Kekeba, Mohammed, Juhar Ahmed, Wassu, Tizazu Alemu, and Muhie, Seid
Subjects: *CONVOLUTIONAL neural networks, *TRANSFORMER models, *DATA augmentation, *IMAGE recognition (Computer vision), *EARLY detection of cancer
Abstract: Introduction: Early diagnosis of cervical cancer at the precancerous stage is critical for effective treatment and improved patient outcomes. Objective: This study aims to explore the use of SWIN Transformer and Convolutional Neural Network (CNN) hybrid models combined with transfer learning to classify precancerous colposcopy images. Methods: Out of 913 images from 200 cases obtained from the Colposcopy Image Bank of the International Agency for Research on Cancer, 898 met quality standards and were classified as normal, precancerous, or cancerous based on colposcopy and histopathological findings. The cases corresponding to the 360 precancerous images, along with an equal number of normal cases, were divided into a 70/30 train–test split. The SWIN Transformer and CNN hybrid model combines the advantages of local feature extraction by CNNs with the global context modeling by SWIN Transformers, resulting in superior classification performance and a more automated process. The hybrid model approach involves enhancing image quality through preprocessing, extracting local features with CNNs, capturing the global context with the SWIN Transformer, integrating these features for classification, and refining the training process by tuning hyperparameters. Results: The trained model achieved the following classification performances on fivefold cross-validation data: a 94% Area Under the Curve (AUC), an 88% F1 score, and 87% accuracy. On two completely independent test sets, which were never seen by the model during training, the model achieved an 80% AUC, a 75% F1 score, and 75% accuracy on the first test set (precancerous vs. normal) and an 82% AUC, a 78% F1 score, and 75% accuracy on the second test set (cancer vs. normal). Conclusions: These high-performance metrics demonstrate the models' effectiveness in distinguishing precancerous from normal colposcopy images, even with modest datasets, limited data augmentation, and the smaller effect size of precancerous images compared to malignant lesions. The findings suggest that these techniques can significantly aid in the early detection of cervical cancer at the precancerous stage. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Cloud Detection Using a UNet3+ Model with a Hybrid Swin Transformer and EfficientNet (UNet3+STE) for Very-High-Resolution Satellite Imagery.

Author: Choi, Jaewan, Seo, Doochun, Jung, Jinha, Han, Youkyung, Oh, Jaehong, and Lee, Changno
Subjects: *TRANSFORMER models, *CONVOLUTIONAL neural networks, *REMOTE-sensing images, *COMMERCIAL product testing, *TEST methods, *DEEP learning
Abstract: It is necessary to extract and recognize the cloud regions presented in imagery to generate satellite imagery as analysis-ready data (ARD). In this manuscript, we proposed a new deep learning model to detect cloud areas in very-high-resolution (VHR) satellite imagery by fusing two deep learning architectures. The proposed UNet3+ model with a hybrid Swin Transformer and EfficientNet (UNet3+STE) was based on the structure of UNet3+, with the encoder sequentially combining EfficientNet based on mobile inverted bottleneck convolution (MBConv) and the Swin Transformer. By sequentially utilizing convolutional neural networks (CNNs) and transformer layers, the proposed algorithm aimed to extract the local and global information of cloud regions effectively. In addition, the decoder used MBConv to restore the spatial information of the feature map extracted by the encoder and adopted the deep supervision strategy of UNet3+ to enhance the model's performance. The proposed model was trained using the open dataset derived from KOMPSAT-3 and 3A satellite imagery and conducted a comparative evaluation with the state-of-the-art (SOTA) methods on fourteen test datasets at the product level. The experimental results confirmed that the proposed UNet3+STE model outperformed the SOTA methods and demonstrated the most stable precision, recall, and F1 score values with fewer parameters and lower complexity. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Removing random noise and improving the resolution of seismic data using deep‐learning transformers.

Author: Sun, Qifeng, Feng, Yali, Du, Qizhen, and Gong, Faming
Subjects: *TRANSFORMER models, *FEATURE extraction, *DATA extraction, *DEEP learning, *NOISE
Abstract: Post‐stack data are susceptible to noise interference and have low resolution, which impacts the accuracy and efficiency of subsequent seismic data interpretation. To address this issue, we propose a deep learning approach called Seis‐SUnet, which achieves simultaneous random noise suppression and super‐resolution reconstruction of seismic data. First, the Conv‐Swin‐Block is designed to utilize ordinary convolution and Swin transformer to capture the long‐distance dependencies in the spatial location of seismic data, enabling the network to comprehensively comprehend the overall structure of seismic data. Second, to address the problem of weakening the effective signal during network mapping, we use a hybrid training strategy of L1 loss, edge loss and multi‐scale structural similarity loss. The edge loss function directs the network training to focus more on the high‐frequency information at the edges of seismic data by amplifying the weight. Additionally, the verification of synthetic and field seismic datasets confirms that Seis‐SUnet can effectively improve the signal‐to‐noise ratio and resolution of seismic data. By comparing it with traditional methods and two deep learning reconstruction methods, experimental results demonstrate that Seis‐SUnet excels in removing random noise, preserving the continuity of rock layers and maintaining faults as well as being strong robustness. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. RSTSRN: Recursive Swin Transformer Super-Resolution Network for Mars Images.

Author: Wu, Fanlu, Jiang, Xiaonan, Fu, Tianjiao, Fu, Yao, Xu, Dongdong, and Zhao, Chunlei
Subjects: TRANSFORMER models, IMAGE reconstruction, HIGH resolution imaging, OPTICAL images, COMPUTATIONAL complexity, DEEP learning
Abstract: High-resolution optical images will provide planetary geology researchers with finer and more microscopic image data information. In order to maximize scientific output, it is necessary to further increase the resolution of acquired images, so image super-resolution (SR) reconstruction techniques have become the best choice. Aiming at the problems of large parameter quantity and high computational complexity in current deep learning-based image SR reconstruction methods, we propose a novel Recursive Swin Transformer Super-Resolution Network (RSTSRN) for SR applied to images. The RSTSRN improves upon the LapSRN, which we use as our backbone architecture. A Residual Swin Transformer Block (RSTB) is used for more efficient residual learning, which consists of stacked Swin Transformer Blocks (STBs) with a residual connection. Moreover, the idea of parameter sharing was introduced to reduce the number of parameters, and a multi-scale training strategy was designed to accelerate convergence speed. Experimental results show that the proposed RSTSRN achieves superior performance on 2×, 4× and 8×SR tasks to state-of-the-art methods with similar parameters. Especially on high-magnification SR tasks, the RSTSRN has great performance superiority. Compared to the LapSRN network, for 2×, 4× and 8× Mars image SR tasks, the RSTSRN network has increased PSNR values by 0.35 dB, 0.88 dB and 1.22 dB, and SSIM values by 0.0048, 0.0114 and 0.0311, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. An improved YOLOv7 model based on Swin Transformer and Trident Pyramid Networks for accurate tomato detection.

Author: Guoxu Liu, Yonghui Zhang, Jun Liu, Deyong Liu, Chunlei Chen, Yujie Li, Xiujie Zhang, and Touko Mbouembe, Philippe Lyonel
Subjects: TRANSFORMER models, FRUIT harvesting, FRUIT, PYRAMIDS, COMMERCIALIZATION
Abstract: Accurate fruit detection is crucial for automated fruit picking. However, real-world scenarios, influenced by complex environmental factors such as illumination variations, occlusion, and overlap, pose significant challenges to accurate fruit detection. These challenges subsequently impact the commercialization of fruit harvesting robots. A tomato detection model named YOLO-SwinTF, based on YOLOv7, is proposed to address these challenges. Integrating Swin Transformer (ST) blocks into the backbone network enables the model to capture global information by modeling long-range visual dependencies. Trident Pyramid Networks (TPN) are introduced to overcome the limitations of PANet's focus on communication-based processing. TPN incorporates multiple self-processing (SP) modules within existing top-down and bottom-up architectures, allowing feature maps to generate new findings for communication. In addition, Focaler-IoU is introduced to reconstruct the original intersection-over-union (IoU) loss to allow the loss function to adjust its focus based on the distribution of difficult and easy samples. The proposed model is evaluated on a tomato dataset, and the experimental results demonstrated that the proposed model's detection recall, precision, F1 score, and AP reach 96.27%, 96.17%, 96.22%, and 98.67%, respectively. These represent improvements of 1.64%, 0.92%, 1.28%, and 0.88% compared to the original YOLOv7 model. When compared to other state-of-the-art detection methods, this approach achieves superior performance in terms of accuracy while maintaining comparable detection speed. In addition, the proposed model exhibits strong robustness under various lighting and occlusion conditions, demonstrating its significant potential in tomato detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. STRay: A Model for Prohibited Item Detection in Security Check Images.

Author: Wenzhao Teng and Yujun Zhang
Subjects: *TRANSFORMER models, *X-ray detection, *PUBLIC safety, *X-rays, *CLASSIFICATION
Abstract: Addressing issues such as mutual occlusion of items and small scale of prohibited items in X-ray security inspection image detection, we propose an improved X-ray contraband detection model based on YOLOv7 named STRay. Firstly, in the backbone network, the model employs Swin Transformer, applying a sliding window multi-head self-attention mechanism to suppress background interference, enabling the network to focus more on contraband items and reducing the false negative rate. Secondly, conventional convolutions in E-ELAN are replaced with deformable dilated convolutions, adjusting the convolutional kernel's shape by learning sampling offsets to better match the contours of contraband items and effectively address mutual occlusion issues. Lastly, the detection head in the head section is replaced with an Efficient decoupled detection head, decoupling separate feature channels for localization and classification tasks, thereby enhancing the classification and localization capabilities for small-scale contraband items. The proposed model is tested on large datasets SIXray, OPIXray, and PIDray, achieving mAPs of 95.3%, 88.8%, and 83.1% respectively, effectively improving contraband detection capabilities while maintaining fast detection speeds. Compared to current mainstream models, it demonstrates certain advancements, providing excellent technical support for ensuring public safety. [ABSTRACT FROM AUTHOR]
Published: 2024

42. Detection of fusarium head blight using a YOLOv5s-based method improved by attention mechanism.

Author: Lei Shi, Chengkai Yang, Xiaoyun Sun, Jiayue Sun, Ping Dong, Shufeng Xiong, and Jian Wang
Subjects: *TRANSFORMER models, *AUTOMATIC identification, *DATA mining, *WHEAT, *FUSARIUM
Abstract: Fusarium head blight (FHB) is one of the most destructive diseases in global wheat production. In order to count the FHB-infected wheat ears under field conditions, this study proposed an algorithm for diseased wheat ear detection based on improved YOLOv5s (Tr-YOLOv5s). The Swin Transformer was used to replace the CSPDarknet backbone network to enhance the extraction of characteristic information of the population wheat ears of FHB in the field background. The convolutional block attention module (CBAM) attention mechanism was added to improve the detection effect of target wheat ears, subsequently improving the overall accuracy of the model. The original loss function complete intersection over union (CIoU) was replaced by Scylla intersection over union (SIoU) loss to accelerate the model convergence and decrease the loss value. The results showed that the mean average precision (mAP) of the Tr-YOLOv5s model reached 90.64%, making a 4.63% improvement compared to the original YOLOv5s model. The improved model could quickly detect and count wheat FHB ear in the field environment, which laid a foundation for the subsequent automatic disease identification and grading of wheat FHB under field conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. SSMM-DS: A semantic segmentation model for mangroves based on Deeplabv3+ with swin transformer.

Author: Wang, Zhenhua, Yang, Jinlong, Dong, Chuansheng, Zhang, Xi, Yi, Congqin, and Sun, Jiuhu
Subjects: *SEMANTICS, *MANGROVE forests, *TRANSFORMER models, *REMOTE sensing, *VISUAL programming languages (Computer science)
Abstract: Mangrove wetlands play a crucial role in maintaining species diversity. However, they face threats from habitat degradation, deforestation, pollution, and climate change. Detecting changes in mangrove wetlands is essential for understanding their ecological implications, but it remains a challenging task. In this study, we propose a semantic segmentation model for mangroves based on Deeplabv3+ with Swin Transformer, abbreviated as SSMM-DS. Using Deeplabv3+ as the basic framework, we first constructed a data concatenation module to improve the contrast between mangroves and other vegetation or water. We then employed Swin Transformer as the backbone network, enhancing the capability of global information learning and detail feature extraction. Finally, we optimized the loss function by combining cross-entropy loss and dice loss, addressing the issue of sampling imbalance caused by the small areas of mangroves. Using GF-1 and GF-6 images, taking mean precision (mPrecision), mean intersection over union (mIoU), floating-point operations (FLOPs), and the number of parameters (Params) as evaluation metrics, we evaluate SSMM-DS against state-of-the-art models, including FCN, PSPNet, OCRNet, uPerNet, and SegFormer. The results demonstrate SSMM-DS's superiority in terms of mIoU, mPrecision, and parameter efficiency. SSMM-DS achieves a higher mIoU (95.11%) and mPrecision (97.79%) while using fewer parameters (17.48M) compared to others. Although its FLOPs are slightly higher than SegFormer's (15.11G vs. 9.9G), SSMM-DS offers a balance between performance and efficiency. Experimental results highlight SSMM-DS's effectiveness in extracting mangrove features, making it a valuable tool for monitoring and managing these critical ecosystems. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Anomaly Detection in Embryo Development and Morphology Using Medical Computer Vision-Aided Swin Transformer with Boosted Dipper-Throated Optimization Algorithm.

Author: Mazroa, Alanoud Al, Maashi, Mashael, Said, Yahia, Maray, Mohammed, Alzahrani, Ahmad A., Alkharashi, Abdulwhab, and Al-Sharafi, Ali M.
Subjects: *TRANSFORMER models, *DEVELOPMENTAL biology, *OPTIMIZATION algorithms, *ARTIFICIAL intelligence, *FERTILIZATION in vitro
Abstract: Infertility affects a significant number of humans. A supported reproduction technology was verified to ease infertility problems. In vitro fertilization (IVF) is one of the best choices, and its success relies on the preference for a higher-quality embryo for transmission. These have been normally completed physically by testing embryos in a microscope. The traditional morphological calculation of embryos shows predictable disadvantages, including effort- and time-consuming and expected risks of bias related to individual estimations completed by specific embryologists. Different computer vision (CV) and artificial intelligence (AI) techniques and devices have been recently applied in fertility hospitals to improve efficacy. AI addresses the imitation of intellectual performance and the capability of technologies to simulate cognitive learning, thinking, and problem-solving typically related to humans. Deep learning (DL) and machine learning (ML) are advanced AI algorithms in various fields and are considered the main algorithms for future human assistant technology. This study presents an Embryo Development and Morphology Using a Computer Vision-Aided Swin Transformer with a Boosted Dipper-Throated Optimization (EDMCV-STBDTO) technique. The EDMCV-STBDTO technique aims to accurately and efficiently detect embryo development, which is critical for improving fertility treatments and advancing developmental biology using medical CV techniques. Primarily, the EDMCV-STBDTO method performs image preprocessing using a bilateral filter (BF) model to remove the noise. Next, the swin transformer method is implemented for the feature extraction technique. The EDMCV-STBDTO model employs the variational autoencoder (VAE) method to classify human embryo development. Finally, the hyperparameter selection of the VAE method is implemented using the boosted dipper-throated optimization (BDTO) technique. The efficiency of the EDMCV-STBDTO method is validated by comprehensive studies using a benchmark dataset. The experimental result shows that the EDMCV-STBDTO method performs better than the recent techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Identification of Anomalies in Lung and Colon Cancer Using Computer Vision-Based Swin Transformer with Ensemble Model on Histopathological Images.

Author: Alsulami, Abdulkream A., Albarakati, Aishah, AL-Ghamdi, Abdullah AL-Malaise, and Ragab, Mahmoud
Subjects: *TRANSFORMER models, *OPTIMIZATION algorithms, *ARTIFICIAL intelligence, *COLON cancer, *MACHINE learning, *DEEP learning
Abstract: Lung and colon cancer (LCC) is a dominant life-threatening disease that needs timely attention and precise diagnosis for efficient treatment. The conventional diagnostic techniques for LCC regularly encounter constraints in terms of efficiency and accuracy, thus causing challenges in primary recognition and treatment. Early diagnosis of the disease can immensely reduce the probability of death. In medical practice, the histopathological study of the tissue samples generally uses a classical model. Still, the automated devices that exploit artificial intelligence (AI) techniques produce efficient results in disease diagnosis. In histopathology, both machine learning (ML) and deep learning (DL) approaches can be deployed owing to their latent ability in analyzing and predicting physically accurate molecular phenotypes and microsatellite uncertainty. In this background, this study presents a novel technique called Lung and Colon Cancer using a Swin Transformer with an Ensemble Model on the Histopathological Images (LCCST-EMHI). The proposed LCCST-EMHI method focuses on designing a DL model for the diagnosis and classification of the LCC using histopathological images (HI). In order to achieve this, the LCCST-EMHI model utilizes the bilateral filtering (BF) technique to get rid of the noise. Further, the Swin Transformer (ST) model is also employed for the purpose of feature extraction. For the LCC detection and classification process, an ensemble deep learning classifier is used with three techniques: bidirectional long short-term memory with multi-head attention (BiLSTM-MHA), Double Deep Q-Network (DDQN), and sparse stacked autoencoder (SSAE). Eventually, the hyperparameter selection of the three DL models can be implemented utilizing the walrus optimization algorithm (WaOA) method. In order to illustrate the promising performance of the LCCST-EMHI approach, an extensive range of simulation analyses was conducted on a benchmark dataset. The experimentation results demonstrated the promising performance of the LCCST-EMHI approach over other recent methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. MS-UNet: Multi-Scale Nested UNet for Medical Image Segmentation with Few Training Data Based on an ELoss and Adaptive Denoising Method †.

Author: Chen, Haoyuan, Han, Yufei, Yao, Linwei, Wu, Xin, Li, Kuan, and Yin, Jianping
Subjects: *TRANSFORMER models, *IMAGE processing, *DIAGNOSTIC imaging, *NETWORK performance
Abstract: Traditional U-shape segmentation models can achieve excellent performance with an elegant structure. However, the single-layer decoder structure of U-Net or SwinUnet is too "thin" to exploit enough information, resulting in large semantic differences between the encoder and decoder parts. Things get worse in the field of medical image processing, where annotated data are more difficult to obtain than other tasks. Based on this observation, we propose a U-like model named MS-UNet with a plug-and-play adaptive denoising module and ELoss for the medical image segmentation task in this study. Instead of the single-layer U-Net decoder structure used in Swin-UNet and TransUNet, we specifically designed a multi-scale nested decoder based on the Swin Transformer for U-Net. The proposed multi-scale nested decoder structure allows for the feature mapping between the decoder and encoder to be semantically closer, thus enabling the network to learn more detailed features. In addition, ELoss could improve the attention of the model to the segmentation edges, and the plug-and-play adaptive denoising module could prevent the model from learning the wrong features without losing detailed information. The experimental results show that MS-UNet could effectively improve network performance with more efficient feature learning capability and exhibit more advanced performance, especially in the extreme case with a small amount of training data. Furthermore, the proposed ELoss and denoising module not only significantly enhance the segmentation performance of MS-UNet but can also be applied individually to other models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. 结合 Swin Transformer 的多尺度遥感图像变化检测研究.

Author: 刘丽, 张起凡, 白宇昂, and 黄凯烨
Abstract: Copyright of Journal of Graphics is the property of Journal of Graphics Editorial Office and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

48. Ensemble of vision transformer architectures for efficient Alzheimer's Disease classification.

Author: Shaffi, Noushath, Viswan, Vimbi, and Mahmud, Mufti
Subjects: MACHINE learning, CONVOLUTIONAL neural networks, TRANSFORMER models, GENERATIVE artificial intelligence, NATURAL language processing
Abstract: Transformers have dominated the landscape of Natural Language Processing (NLP) and revolutionalized generative AI applications. Vision Transformers (VT) have recently become a new state-of-the-art for computer vision applications. Motivated by the success of VTs in capturing short and long-range dependencies and their ability to handle class imbalance, this paper proposes an ensemble framework of VTs for the efficient classification of Alzheimer's Disease (AD). The framework consists of four vanilla VTs, and ensembles formed using hard and soft-voting approaches. The proposed model was tested using two popular AD datasets: OASIS and ADNI. The ADNI dataset was employed to assess the models' efficacy under imbalanced and data-scarce conditions. The ensemble of VT saw an improvement of around 2% compared to individual models. Furthermore, the results are compared with state-of-the-art and custom-built Convolutional Neural Network (CNN) architectures and Machine Learning (ML) models under varying data conditions. The experimental results demonstrated an overall performance gain of 4.14% and 4.72% accuracy over the ML and CNN algorithms, respectively. The study has also identified specific limitations and proposes avenues for future research. The codes used in the study are made publicly available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. BS-YOLOV8: an intelligent detection model for bearing pin support-piece states of high-rise building machine.

Author: Pan, Xi, Zhao, Tingsheng, Li, Xuxiang, and Jiang, Xiaohui
Subjects: TRANSFORMER models, SKYSCRAPERS, LEAK detection, INTELLIGENT buildings, ERROR rates, INTRUSION detection systems (Computer security), PROBLEM solving, IMAGE encryption
Abstract: As the main support part of the working platform of a high-rise building machine, the bearing pin support (BPS) plays a crucial role in the safety and stability of the platform, the conventional method has the problems of low detection efficiency, low accuracy, and high cost. To improve the accuracy and robustness of the detection algorithm under weak light, this paper proposes an intelligent detection algorithm for the BPS-piece states of the BS-YOLOV8, to improve the feature map utilization and reduce the model leakage detection error detection rate, Swin transformer is used to improve the YOLOV8 backbone network. In addition, the BiFormer attention mechanism is used to weigh the feature map to solve the problem of feature information loss in different feature layers and weak lighting conditions, and then the Scylla-IOU loss function is used instead of the original localization loss function to guide the model to learn to generate a predicted bounding box closer to the real target bounding box. Finally, the BS-YOLOV8 algorithm is used to compare with its classical algorithm on the self-constructed dataset of this study, The results show that the mAP0.5, mAP0.5:0.95, and FPS values of the BS-YOLOV8 algorithm reach 97.9%, 96.3% and 40 under normal lighting. The mAP0.5 value reaches 87.6% under low light conditions, which effectively solves the problems of low detection efficiency and poor detection under low light conditions, and is superior compared to other algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Hybrid Swin-CSRNet: A Novel and Efficient Fish Counting Network in Aquaculture.

Author: Liu, Jintao, Tolón-Becerra, Alfredo, Bienvenido-Barcena, José Fernando, Yang, Xinting, Zhu, Kaijie, and Zhou, Chao
Subjects: FISH population estimates, TRANSFORMER models, BIOMASS estimation, COMPUTATIONAL complexity, AQUACULTURE
Abstract: Real-time estimation of fish biomass plays a crucial role in real-world fishery production, as it helps formulate feeding strategies and other management decisions. In this paper, a dense fish counting network called Swin-CSRNet is proposed. Specifically, the VGG16 layer in the front-end is replaced with the Swin transformer to extract image features more efficiently. Additionally, a squeeze-and-excitation (SE) module is introduced to enhance feature representation by dynamically adjusting the importance of each channel through "squeeze" and "excitation", making the extracted features more focused and effective. Finally, a multi-scale fusion (MSF) module is added after the back-end to fully utilize the multi-scale feature information, enhancing the model's ability to capture multi-scale details. The experiment demonstrates that Swin-CSRNet achieved excellent results with MAE, RMSE, and MAPE and a correlation coefficient R2 of 11.22, 15.32, 5.18%, and 0.954, respectively. Meanwhile, compared to the original network, the parameter size and computational complexity of Swin-CSRNet were reduced by 70.17% and 79.05%, respectively. Therefore, the proposed method not only counts the number of fish with higher speed and accuracy but also contributes to advancing the automation of aquaculture. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

1,219 results on '"swin transformer"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources