Publication Type: Academic Journals / Publication Year Range: This year / Topic: attention mechanism - Searchworks@Jio Institute Digital Library Search Results

1. ResGAT: an improved graph neural network based on multi-head attention mechanism and residual network for paper classification

Author: Huang, Xuejian, Wu, Zhibin, Wang, Gensheng, Li, Zhipeng, Luo, Yuansheng, and Wu, Xiaofang
Published: 2024
Full Text: View/download PDF

2. Multi-Feature-Enhanced Academic Paper Recommendation Model with Knowledge Graph

Author: Le Wang, Wenna Du, and Zehua Chen
Subjects: academic paper recommendation, knowledge graph, neural networks, attention mechanism, sequential recommendation, Technology, Engineering (General). Civil engineering (General), TA1-2040, Biology (General), QH301-705.5, Physics, QC1-999, Chemistry, QD1-999
Abstract: This paper addresses the challenges of data sparsity and personalization limitations inherent in current recommendation systems when processing extensive academic paper datasets. To overcome these issues, the present work introduces an innovative recommendation model that integrates the wealth of structured information from knowledge graphs and refines the amalgamation of temporal and relational data. By applying attention mechanisms and neural network technologies, the model thoroughly explores the text characteristics of papers and the evolving patterns of user behaviors. Additionally, the model elevates the accuracy and personalization of recommendations by meticulously examining citation patterns among papers and the networks of author collaboration. The experimental findings show that the present model surpasses baseline models on all evaluation metrics, thereby enhancing the precision and personalization of academic paper recommendations.
Published: 2024
Full Text: View/download PDF

3. Multi-Feature-Enhanced Academic Paper Recommendation Model with Knowledge Graph.

Author: Wang, Le, Du, Wenna, and Chen, Zehua
Subjects: KNOWLEDGE graphs, RECOMMENDER systems
Abstract: This paper addresses the challenges of data sparsity and personalization limitations inherent in current recommendation systems when processing extensive academic paper datasets. To overcome these issues, the present work introduces an innovative recommendation model that integrates the wealth of structured information from knowledge graphs and refines the amalgamation of temporal and relational data. By applying attention mechanisms and neural network technologies, the model thoroughly explores the text characteristics of papers and the evolving patterns of user behaviors. Additionally, the model elevates the accuracy and personalization of recommendations by meticulously examining citation patterns among papers and the networks of author collaboration. The experimental findings show that the present model surpasses baseline models on all evaluation metrics, thereby enhancing the precision and personalization of academic paper recommendations. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Thermal error modeling method of truss robot based on GA-LSTM

Author: Li, Long, Chen, Binyang, and Yu, Jiangli
Published: 2024
Full Text: View/download PDF

5. Short-term train arrival delay prediction: a data-driven approach

Author: Fu, Qingyun, Ding, Shuxin, Zhang, Tao, Wang, Rongsheng, Hu, Ping, and Pu, Cunlai
Published: 2024
Full Text: View/download PDF

6. Analyzing audiovisual data for understanding user's emotion in human−computer interaction environment

Author: Yang, Juan, Li, Zhenkun, and Du, Xu
Published: 2024
Full Text: View/download PDF

7. Channel attention-based spatial-temporal graph neural networks for traffic prediction

Author: Wang, Bin, Gao, Fanghong, Tong, Le, Zhang, Qian, and Zhu, Sulei
Published: 2024
Full Text: View/download PDF

8. PEERRec: An AI-based approach to automatically generate recommendations and predict decisions in peer review.

Author: Bharti, Prabhat Kumar, Ghosal, Tirthankar, Agarwal, Mayank, and Ekbal, Asif
Subjects: ARTIFICIAL intelligence, ARTIFICIAL neural networks
Abstract: One key frontier of artificial intelligence (AI) is the ability to comprehend research articles and validate their findings, posing a magnanimous problem for AI systems to compete with human intelligence and intuition. As a benchmark of research validation, the existing peer-review system still stands strong despite being criticized at times by many. However, the paper vetting system has been severely strained due to an influx of research paper submissions and increased conferences/journals. As a result, problems, including having insufficient reviewers, finding the right experts, and maintaining review quality, are steadily and strongly surfacing. To ease the workload of the stakeholders associated with the peer-review process, we probed into what an AI-powered review system would look like. In this work, we leverage the interaction between the paper's full text and the corresponding peer-review text to predict the overall recommendation score and final decision. We do not envisage AI reviewing papers in the near future. Still, we intend to explore the possibility of a human–AI collaboration in the decision-making process to make the current system FAIR. The idea is to have an assistive decision-making tool for the chairs/editors to help them with an additional layer of confidence, especially with borderline and contrastive reviews. We use a deep attention network between the review text and paper to learn the interactions and predict the overall recommendation score and final decision. We also use sentiment information encoded within peer-review texts to guide the outcome further. Our proposed model outperforms the recent state-of-the-art competitive baselines. We release the code of our implementation here: https://github.com/PrabhatkrBharti/PEERRec.git. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Research on low-light image enhancement based on MER-Retinex algorithm.

Author: Zhou, Rongfeng, Wang, Rugang, Wang, Yuanyuan, Zhou, Feng, and Guo, Naihong
Abstract: To solve blurring and poor visual effects after enhancement of low-light images by conventional low-light algorithms, this paper proposes a MER-Retinex (multi-scale expansion reconstruction retinex) algorithm that integrates attention mechanism and multi-scale expansion pyramid reconstruction. It includes two parts: decomposition and enhancement module. In the decomposition module, two U-shaped networks are used to decompose the image into reflectance and illumination, then, use multi-layer convolution to expand the field of perception and improve the ability to decompose the image to obtain reflectance and illumination. In the enhancement module, a U-shaped network is used to fuse multi-scale expansion pyramids with a multi-attention mechanism to enrich image information, increase image brightness and fuse the processed global information with local information to enhance the recovered image details. In the enhanced reconstruction section, super-resolution techniques are used to enhance and denoise image feature details. Experimental analysis of the MER-Retinex algorithm was carried out on the LOL dataset. The PSNR of the algorithm in this paper was 25.26 and the NIQE was 3.43. The algorithm in this paper can effectively solve the problems of blurred images and poor visual effects, and has improved in both subjective perception and objective evaluation indexes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. Multi-Beam Sonar Target Segmentation Algorithm Based on BS-Unet.

Author: Zhang, Wennuo, Zhang, Xuewu, Zhang, Yu, Zeng, Pengyuan, Wei, Ruikai, Xu, Junsong, and Chen, Yang
Subjects: SONAR imaging, IMAGE segmentation, DAM safety, INSPECTION & review, SONAR
Abstract: Multi-beam sonar imaging detection technology is increasingly becoming the mainstream technology in fields such as hydraulic safety inspection and underwater target detection due to its ability to generate clearer images under low-visibility conditions. However, during the multi-beam sonar detection process, issues such as low image resolution and blurred imaging edges lead to decreased target segmentation accuracy. Traditional filtering methods for echo signals cannot effectively solve these problems. To address these challenges, this paper introduces, for the first time, a multi-beam sonar dataset against the background of simulated crack detection for dam safety. This dataset included simulated cracks detected by multi-beam sonar from various angles. The width of the cracks ranged from 3 cm to 9 cm, and the length ranged from 0.2 m to 1.5 m. In addition, this paper proposes a BS-UNet semantic segmentation algorithm. The Swin-UNet model incorporates a dual-layer routing attention mechanism to enhance the accuracy of sonar image detail segmentation. Furthermore, an online convolutional reparameterization structure was added to the output end of the model to improve the model's capability to represent image features. Comparisons of the BS-UNet model with commonly used semantic segmentation models on the multi-beam sonar dataset consistently demonstrated the BS-UNet model's superior performance, as it improved semantic segmentation evaluation metrics such as Precision and IoU by around 0.03 compared to the Swin-UNet model. In conclusion, BS-UNet can effectively be applied in multi-beam sonar image segmentation tasks. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. A Microvascular Segmentation Network Based on Pyramidal Attention Mechanism.

Author: Zhang, Hong, Fang, Wei, and Li, Jiayun
Subjects: RETINAL blood vessels, DIABETIC retinopathy, COMPLEX variables, EYE diseases, PIXELS, MEDICAL screening, BLOOD vessels, FREE flaps
Abstract: The precise segmentation of retinal vasculature is crucial for the early screening of various eye diseases, such as diabetic retinopathy and hypertensive retinopathy. Given the complex and variable overall structure of retinal vessels and their delicate, minute local features, the accurate extraction of fine vessels and edge pixels remains a technical challenge in the current research. To enhance the ability to extract thin vessels, this paper incorporates a pyramid channel attention module into a U-shaped network. This allows for more effective capture of information at different levels and increased attention to vessel-related channels, thereby improving model performance. Simultaneously, to prevent overfitting, this paper optimizes the standard convolutional block in the U-Net with the pre-activated residual discard convolution block, thus improving the model's generalization ability. The model is evaluated on three benchmark retinal datasets: DRIVE, CHASE_DB1, and STARE. Experimental results demonstrate that, compared to the baseline model, the proposed model achieves improvements in sensitivity (Sen) scores of 7.12%, 9.65%, and 5.36% on these three datasets, respectively, proving its strong ability to extract fine vessels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. Target Recognition Based on Infrared and Visible Image Fusion and Improved YOLOv8 Algorithm.

Author: Guo, Wei, Li, Yongtao, Li, Hanyan, Chen, Ziyou, Xu, Enyong, Wang, Shanchao, and Gu, Chengdong
Subjects: IMAGE fusion, INFRARED imaging, CONVOLUTIONAL neural networks, FEATURE extraction, IMAGE reconstruction
Abstract: In response to the issue that the fusion process of infrared and visible images is easily affected by lighting factors, in this paper, we propose an adaptive illumination perception fusion mechanism, which was integrated into an infrared and visible image fusion network. Spatial attention mechanisms were applied to both infrared images and visible images for feature extraction. Deep convolutional neural networks were utilized for further feature information extraction. The adaptive illumination perception fusion mechanism is then integrated into the image reconstruction process to reduce the impact of lighting variations in the fused images. A Median Strengthening Channel and Spatial Attention Module (MSCS) was designed to be integrated into the backbone of YOLOv8. In this paper, we used the fusion network to create a dataset named ivifdata for training the target recognition network. The experimental results indicated that the improved YOLOv8 network saw further enhancements of 2.3%, 1.4%, and 8.2% in the Recall, mAP50, and mAP50-95 metrics, respectively. The experiments revealed that the improved YOLOv8 network has advantages in terms of recognition rate and completeness, while also reducing the rates of false negatives and false positives. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

13. Lane Attribute Classification Based on Fine-Grained Description.

Author: He, Zhonghe, Gong, Pengfei, Ye, Hongcheng, and Gan, Zizheng
Subjects: TRAFFIC monitoring, ROAD markings, PROBLEM solving, ANNOTATIONS, ALGORITHMS, INTELLIGENT transportation systems
Abstract: As an indispensable part of the vehicle environment perception task, road traffic marking detection plays a vital role in correctly understanding the current traffic situation. However, the existing traffic marking detection algorithms still have some limitations. Taking lane detection as an example, the current detection methods mainly focus on the location information detection of lane lines, and they only judge the overall attribute of each detected lane line instance, thus lacking more fine-grained dynamic detection of lane line attributes. In order to meet the needs of intelligent vehicles for the dynamic attribute detection of lane lines and more perfect road environment information in urban road environment, this paper constructs a fine-grained attribute detection method for lane lines, which uses pixel-level attribute sequence points to describe the complete attribute distribution of lane lines and then matches the detection results of the lane lines. Realizing the attribute judgment of different segment positions of lane instances is called the fine-grained attribute detection of lane lines (Lane-FGA). In addition, in view of the lack of annotation information in the current open-source lane data set, this paper constructs a lane data set with both lane instance information and fine-grained attribute information by combining manual annotation and intelligent annotation. At the same time, a cyclic iterative attribute inference algorithm is designed to solve the difficult problem of lane attribute labeling in areas without visual cues such as occlusion and damage. In the end, the average accuracy of the proposed algorithm reaches 97% on various types of lane attribute detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. Combining data augmentation and deep learning for improved epilepsy detection.

Author: Yandong Ru, Zheng Wei, Gaoyang An, and Hongming Chen
Subjects: SIGNAL convolution, DATA augmentation, CONVOLUTIONAL neural networks, DEEP learning, EPILEPSY, RECURRENT neural networks
Abstract: Introduction: In recent years, the use of EEG signals for seizure detection has gained widespread academic attention. Aiming at the problem of overfitting deep learning models due to the small number of EEG signal data during epilepsy detection, this paper proposes an epilepsy detection method that combines data augmentation and deep learning. Methods: First, the Adversarial and Mixup Data Augmentation (AMDA) method is used to realize the data augmentation, which effectively enriches the number of training samples. To further improve the classification accuracy and robustness of epilepsy detection, this paper proposes a one-dimensional convolutional neural network and gated recurrent unit (AM-1D CNN-GRU) network model based on attention mechanism for epilepsy detection. Results and discussion: The experimental results show that the performance of epilepsy detection achieved by using augmented data is significantly improved, and the accuracy, sensitivity, and area under the subject's working characteristic curve are up to 96.06, 95.48%, and 0.9637, respectively. Compared with the nonaugmented data, all indicators are increased by more than 6.2%. Meanwhile, the detection performance was significantly improved compared with other epilepsy detection methods. The results of this research can provide a reference for the clinical application of epilepsy detection. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. ERNIE and Multi-Feature Fusion for News Topic Classification.

Author: Weisong Chen, Boting Liu, and Weili Guan
Subjects: CLASSIFICATION, SEMANTICS, ALGORITHMS, LANGUAGE & languages, VOCABULARY
Abstract: Traditional news topic classification methods suffer from inaccurate text semantics, sparse text features, and low classification accuracy. Based on this, this paper proposes a news topic classification method based on Enhanced Language Representation with Informative Entities (ERNIE) and multi-feature fusion. A semantically more accurate representation of text embedding is obtained by ERNIE. In addition, this paper extracts word, context, and key sentence based on the news text. The key sentences of the news are obtained through the TextRank algorithm, which enables the model to focus on the content points of the news. Finally, this paper uses the attention mechanism to realize the fusion of multiple features. The proposed method is experimented on BBC News. The experimental results show that we achieve classification accuracies superior to those of the compared methods, while validating the structural validity of the proposed method. The method in this paper has a positive effect on promoting the research of news topic classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. CSINet: Channel–Spatial Fusion Networks for Asymmetric Facial Expression Recognition.

Author: Cheng, Yan and Kong, Defeng
Subjects: FACIAL expression, RECOMMENDER systems, INFORMATION filtering, HUMAN-computer interaction, PROBLEM solving, WIRELESS channels
Abstract: Occlusion or posture change of the face in natural scenes has typical asymmetry; however, an asymmetric face plays a key part in the lack of information available for facial expression recognition. To solve the problem of low accuracy of asymmetric facial expression recognition, this paper proposes a fusion of channel global features and a spatial local information expression recognition network called the "Channel–Spatial Integration Network" (CSINet). First, to extract the underlying detail information and deepen the network, the attention residual module with a redundant information filtering function is designed, and the backbone feature-extraction network is constituted by module stacking. Second, considering the loss of information in the local key area of face occlusion, the channel–spatial fusion structure is constructed, and the channel features and spatial features are combined to enhance the accuracy of occluded facial recognition. Finally, before the full connection layer, more local spatial information is embedded into the global channel information to capture the relationship between different channel–spatial targets, which improves the accuracy of feature expression. Experimental results on the natural scene facial expression data sets RAF-DB and FERPlus show that the recognition accuracies of the modeling approach proposed in this paper are 89.67% and 90.83%, which are 13.24% and 11.52% higher than that of the baseline network ResNet50, respectively. Compared with the latest facial expression recognition methods such as CVT, PACVT, etc., the method in this paper obtains better evaluation results of masked facial expression recognition, which provides certain theoretical and technical references for daily facial emotion analysis and human–computer interaction applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Classifying breast cancer using multi-view graph neural network based on multi-omics data.

Author: Yanjiao Ren, Yimeng Gao, Wei Du, Weibo Qiao, Wei Li, Qianqian Yang, Yanchun Liang, and Gaoyang Li
Subjects: GRAPH neural networks, DEEP learning, MACHINE learning, FEATURE selection, BREAST cancer, TUMOR classification
Abstract: Introduction: As the evaluation indices, cancer grading and subtyping have diverse clinical, pathological, and molecular characteristics with prognostic and therapeutic implications. Although researchers have begun to study cancer differentiation and subtype prediction, most of relevant methods are based on traditional machine learning and rely on single omics data. It is necessary to explore a deep learning algorithm that integrates multi-omics data to achieve classification prediction of cancer differentiation and subtypes. Methods: This paper proposes a multi-omics data fusion algorithm based on a multi-view graph neural network (MVGNN) for predicting cancer differentiation and subtype classification. The model framework consists of a graph convolutional network (GCN) module for learning features from different omics data and an attention module for integrating multi-omics data. Three different types of omics data are used. For each type of omics data, feature selection is performed using methods such as the chi-square test and minimum redundancy maximum relevance (mRMR). Weighted patient similarity networks are constructed based on the selected omics features, and GCN is trained using omics features and corresponding similarity networks. Finally, an attention module integrates different types of omics features and performs the final cancer classification prediction. Results: To validate the cancer classification predictive performance of the MVGNN model, we conducted experimental comparisons with traditional machine learning models and currently popular methods based on integrating multi-omics data using 5-fold cross-validation. Additionally, we performed comparative experiments on cancer differentiation and its subtypes based on single omics data, two omics data, and three omics data. Discussion: This paper proposed the MVGNN model and it performed well in cancer classification prediction based on multiple omics data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. A GAN-EfficientNet-Based Traceability Method for Malicious Code Variant Families.

Author: Li, Li, Zhang, Qing, and Kong, Youran
Subjects: GENERATIVE adversarial networks, ARTIFICIAL neural networks, SUPPLY & demand, GENERALIZATION, FAMILIES
Abstract: Due to the diversity and unpredictability of changes in malicious code, studying the traceability of variant families remains challenging. In this paper, we propose a GAN-EfficientNetV2-based method for tracing families of malicious code variants. This method leverages the similarity in layouts and textures between images of malicious code variants from the same source and their original family of malicious code images. The method includes a lightweight classifier and a simulator. The classifier utilizes the enhanced EfficientNetV2 to categorize malicious code images and can be easily deployed on mobile, embedded, and other devices. The simulator utilizes an enhanced generative adversarial network to simulate different variants of malicious code and generates datasets to validate the model's performance. This process helps identify model vulnerabilities and security risks, facilitating model enhancement and development. The classifier achieves 98.61% and 97.59% accuracy on the MMCC dataset and Malevis dataset, respectively. The simulator's generated image of malicious code variants has an FID value of 155.44 and an IS value of 1.72 ± 0.42. The classifier's accuracy for tracing the family of malicious code variants is as high as 90.29%, surpassing that of mainstream neural network models. This meets the current demand for high generalization and anti-obfuscation abilities in malicious code classification models due to the rapid evolution of malicious code. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Multifactorial Tomato Leaf Disease Detection Based on Improved YOLOV5.

Author: Wang, Guoying, Xie, Rongchang, Mo, Lufeng, Ye, Fujun, Yi, Xiaomei, and Wu, Peng
Subjects: FEATURE extraction, TOMATO farming, DEEP learning, LEAF anatomy, TOMATOES, GLOBAL method of teaching
Abstract: Target detection algorithms can greatly improve the efficiency of tomato leaf disease detection and play an important technical role in intelligent tomato cultivation. However, there are some challenges in the detection process, such as the diversity of complex backgrounds and the loss of leaf symmetry due to leaf shadowing, and existing disease detection methods have some disadvantages in terms of deteriorating generalization ability and insufficient accuracy. Aiming at the above issues, a target detection model for tomato leaf disease based on deep learning with a global attention mechanism, TDGA, is proposed in this paper. The main idea of TDGA includes three aspects. Firstly, TDGA adds a global attention mechanism (GAM) after up-sampling and down-sampling, as well as in the SPPF module, to improve the feature extraction ability of the target object, effectively reducing the interference of invalid targets. Secondly, TDGA uses a switchable atrous convolution (SAConv) in the C3 module to improve the model's ability to detect. Thirdly, TDGA adopts the efficient IoU loss (EIoU) instead of complete IoU loss (CIoU) to solve the ambiguous definition of aspect ratio and sample imbalance. In addition, the influences of different environmental factors such as single leaf, multiple leaves, and shadows on the performance of tomato disease detection are extensively experimented with and analyzed in this paper, which also verified the robustness of TDGA. The experimental results show that the average accuracy of TDGA reaches 91.40%, which is 2.93% higher than that of the original YOLOv5 network, which is higher than YOLOv5, YOLOv7, YOLOHC, YOLOv8, SSD, Faster R-CNN, RetinaNet and other target detection networks, so that TDGA can be utilized for the detection of tomato leaf disease more efficiently and accurately, even in complex environments. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. GCAM: lightweight image inpainting via group convolution and attention mechanism.

Author: Chen, Yuantao, Xia, Runlong, Yang, Kai, and Zou, Ke
Abstract: Recently, image inpainting techniques tend to be more concerned with how to enhance the quality of restoration than with how to function on various platforms with limited processing power. In this paper, we propose a lightweight method that combines group convolution and attention mechanism to improve or replace the traditional convolution module. Group convolution was used to achieve multi-level image inpainting, and the authors proposed the rotating attention mechanism for allocation to deal with the issue of information mobility between channels in traditional convolution processing. The parallel discriminator structure was utilized throughout the network's overall design phase to guarantee both local and global consistency of the image inpainting process. The experimental results can demonstrate that, while the quality of image inpainting has been ensured, the proposed image inpainting network's inference time and resource usage are significantly lower than those of comparable lightweight approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. MVT-CEAM: a lightweight MobileViT with channel expansion and attention mechanism for facial expression recognition.

Author: Wang, Kunxia, Yu, Wancheng, and Yamauchi, Takashi
Abstract: Facial expression recognition is a crucial area of study in psychology that can be applied to many fields, such as intelligent healthcare, human-computer interaction, fuzzy control and other domains. However, current deep learning models usually encounter high complexity, expensive computational requirements and outsized parameters. These obstacles hinder the deployment of applications on resource-constrained mobile terminals. This paper proposes an improved lightweight MobileViT with channel expansion and attention mechanism for facial expression recognition to address these challenges. In this model, we adopt a channel expansion strategy to effectively extract more critical facial expression feature information from multi-scale feature maps. Furthermore, we introduce a channel attention module within the model to improve feature extraction performance. Compared with typical lightweight models, our proposed model significantly improves the accuracy rate while maintaining a lightweight network. Our proposed model achieves 94.35 and 87.41% accuracy on the KDEF and RAF-DB datasets, respectively, demonstrating superior recognition performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Multimodal Social Media Fake News Detection Based on 1D-CCNet Attention Mechanism.

Author: Yan, Yuhan, Fu, Haiyan, and Wu, Fan
Subjects: INTERNET content, FAKE news, SOCIAL media, VIRTUAL communities
Abstract: Due to the explosive rise of multimodal content in online social communities, cross-modal learning is crucial for accurate fake news detection. However, current multimodal fake news detection techniques face challenges in extracting features from multiple modalities and fusing cross-modal information, failing to fully exploit the correlations and complementarities between different modalities. To address these issues, this paper proposes a fake news detection model based on a one-dimensional CCNet (1D-CCNet) attention mechanism, named BTCM. This method first utilizes BERT and BLIP-2 encoders to extract text and image features. Then, it employs the proposed 1D-CCNet attention mechanism module to process the input text and image sequences, enhancing the important aspects of the bimodal features. Meanwhile, this paper uses the pre-trained BLIP-2 model for object detection in images, generating image descriptions and augmenting text data to enhance the dataset. This operation aims to further strengthen the correlations between different modalities. Finally, this paper proposes a heterogeneous cross-feature fusion method (HCFFM) to integrate image and text features. Comparative experiments were conducted on three public datasets: Twitter, Weibo, and Gossipcop. The results show that the proposed model achieved excellent performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. GFI-YOLOv8: Sika Deer Posture Recognition Target Detection Method Based on YOLOv8.

Author: Gong, He, Liu, Jingyi, Li, Zhipeng, Zhu, Hang, Luo, Lan, Li, Haoxu, Hu, Tianli, Guo, Ying, and Mu, Ye
Subjects: CONVOLUTIONAL neural networks, SIKA deer, OBJECT recognition (Computer vision), DEER behavior, ANIMAL behavior
Abstract: Simple Summary: Through gesture recognition and detection of sika deer, farmers can observe the gestures of sika deer without physical contact, providing data and technical support for the intelligent and welfare-oriented breeding of sika deer. This study is based on the YOLOv8 network model. By optimizing the convolution module, incorporating the attention mechanism, and enhancing the detection head module, a new method for detecting sika deer poses was developed. The method was assessed using four behavioral datasets, which included standing, lying, eating, and attacking. The pose-recognition accuracy of sika deer significantly improved to an average of 91.6%, laying a foundation for the health assessment and information management of sika deer. As the sika deer breeding industry flourishes on a large scale, accurately assessing the health of these animals is of paramount importance. Implementing posture recognition through target detection serves as a vital method for monitoring the well-being of sika deer. This approach allows for a more nuanced understanding of their physical condition, ensuring the industry can maintain high standards of animal welfare and productivity. In order to achieve remote monitoring of sika deer without interfering with the natural behavior of the animals, and to enhance animal welfare, this paper proposes a sika deer individual posture recognition detection algorithm GFI-YOLOv8 based on YOLOv8. Firstly, this paper proposes to add the iAFF iterative attention feature fusion module to the C2f of the backbone network module, replace the original SPPF module with AIFI module, and use the attention mechanism to adjust the feature channel adaptively. This aims to enhance granularity, improve the model's recognition, and enhance understanding of sika deer behavior in complex scenes. Secondly, a novel convolutional neural network module is introduced to improve the efficiency and accuracy of feature extraction, while preserving the model's depth and diversity. In addition, a new attention mechanism module is proposed to expand the receptive field and simplify the model. Furthermore, a new pyramid network and an optimized detection head module are presented to improve the recognition and interpretation of sika deer postures in intricate environments. The experimental results demonstrate that the model achieves 91.6% accuracy in recognizing the posture of sika deer, with a 6% improvement in accuracy and a 4.6% increase in mAP50 compared to YOLOv8n. Compared to other models in the YOLO series, such as YOLOv5n, YOLOv7-tiny, YOLOv8n, YOLOv8s, YOLOv9, and YOLOv10, this model exhibits higher accuracy, and improved mAP50 and mAP50-95 values. The overall performance is commendable, meeting the requirements for accurate and rapid identification of the posture of sika deer. This model proves beneficial for the precise and real-time monitoring of sika deer posture in complex breeding environments and under all-weather conditions. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Ghost-YOLO v8: An Attention-Guided Enhanced Small Target Detection Algorithm for Floating Litter on Water Surfaces.

Author: Huangfu, Zhongmin, Li, Shuqing, and Yan, Luoheng
Subjects: FEATURE extraction, PYRAMIDS, ALGORITHMS, NECK, LAKES
Abstract: Addressing the challenges in detecting surface floating litter in artificial lakes, including complex environments, uneven illumination, and susceptibility to noise and weather, this paper proposes an efficient and lightweight Ghost-YOLO (You Only Look Once) v8 algorithm. The algorithm integrates advanced attention mechanisms and a small-target detection head to significantly enhance detection performance and efficiency. Firstly, an SE (Squeeze-and-Excitation) mechanism is incorporated into the backbone network to fortify the extraction of resilient features and precise target localization. This mechanism models feature channel dependencies, enabling adaptive adjustment of channel importance, thereby improving recognition of floating litter targets. Secondly, a 160 × 160 small-target detection layer is designed in the feature fusion neck to mitigate semantic information loss due to varying target scales. This design enhances the fusion of deep and shallow semantic information, improving small target feature representation and enabling better capture and identification of tiny floating litter. Thirdly, to balance performance and efficiency, the GhostConv module replaces part of the conventional convolutions in the feature fusion neck. Additionally, a novel C2fGhost (CSPDarknet53 to 2-Stage Feature Pyramid Networks Ghost) module is introduced to further reduce network parameters. Lastly, to address the challenge of occlusion, a new loss function, WIoU (Wise Intersection over Union) v3 incorporating a flexible and non-monotonic concentration approach, is adopted to improve detection rates for surface floating litter. The outcomes of the experiments demonstrate that the Ghost-YOLO v8 model proposed in this paper performs well in the dataset Marine, significantly enhances precision and recall by 3.3 and 7.6 percentage points, respectively, in contrast with the base model, mAP@0.5 and mAP@0.5:0.95 improve by 5.3 and 4.4 percentage points and reduces the computational volume by 1.88 MB, the FPS value hardly decreases, and the efficient real-time identification of floating debris on the water's surface can be achieved cost-effectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Predicting significant wave height in the South China Sea using the SAC-ConvLSTM model.

Author: Boyang Hou, Hanjiao Fu, Xin Li, Tao Song, and Zhiyuan Zhang
Subjects: EXTREME weather, STANDARD deviations, OCEANOGRAPHY, PEARSON correlation (Statistics), OCEAN engineering
Abstract: Introduction: The precise forecasting of Significant wave height(SWH) is vital to ensure the safety and efficiency of aquatic activities such as ocean engineering, shipping, and fishing. Methods: This paper proposes a deep learning model named SAC-ConvLSTM to perform 24-hour prediction with the SWH in the South China Sea. The long-term prediction capability of the model is enhanced by using the attention mechanism and context vectors. The prediction ability of the model is evaluated by mean absolute error (MAE), root mean square error (RMSE), mean square error (MSE), and Pearson correlation coefficient (PCC). Results: The experimental results show that the optimal input sequence length for the model is 12. Starting from 12 hours, the SAC-ConvLSTM model consistently outperforms other models in predictive performance. For the 24-hour prediction, this model achieves RMSE, MAE, and PCC values of 0.2117 m, 0.1083 m, and 0.9630, respectively. In addition, the introduction of wind can improve the accuracy of wave prediction. The SAC-ConvLSTM model also has good prediction performance compared to the ConvLSTM model during extreme weather, especially in coastal areas. Discussion: This paper presents a 24-hour prediction of SWH in the South China Sea. Through comparative validation, the SAC-ConvLSTM model outperforms other models. The inclusion of wind data enhances the model's predictive capability. This model also performs well under extreme weather conditions. In physical oceanography, variables related to SWH include not only wind but also other factors such as mean wave period and sea surface air pressure. In the future, additional variables can be incorporated to further improve the model's predictive performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Amur Tiger Individual Identification Based on the Improved InceptionResNetV2.

Author: Wu, Ling, Jinma, Yongyi, Wang, Xinyang, Yang, Feng, Xu, Fu, Cui, Xiaohui, and Sun, Qiao
Subjects: ARTIFICIAL neural networks, CONVOLUTIONAL neural networks, OBJECT recognition (Computer vision), RECOGNITION (Psychology), TIGERS
Abstract: Simple Summary: Accurate identification of individual Amur tigers is vital for their conservation, as it helps us understand their population and distribution. Existing identification methods often fall short in accuracy, and our study focuses on creating a more accurate method for identifying individual Amur tigers using advanced deep learning techniques. We improved an existing neural network model called InceptionResNetV2 by adding features like dropout layers and dual-attention mechanisms to better capture the unique stripe patterns of each tiger and reduce errors during training. We tested our model on a large dataset of tiger images and found it to be highly effective, achieving an average recognition accuracy of over 95% for different body parts, with left stripes reaching the highest 99.37%. This method significantly outperforms previous models and provides a reliable tool for wildlife researchers and conservationists to monitor and protect Amur tigers. By improving the ability to track individual tigers, our research offers practical benefits for preserving this endangered species and enhancing wildlife management practices globally. Accurate and intelligent identification of rare and endangered individuals of flagship wildlife species, such as Amur tiger (Panthera tigris altaica), is crucial for understanding population structure and distribution, thereby facilitating targeted conservation measures. However, many mathematical modeling methods, including deep learning models, often yield unsatisfactory results. This paper proposes an individual recognition method for Amur tigers based on an improved InceptionResNetV2 model. Initially, the YOLOv5 model is employed to automatically detect and segment facial, left stripe, and right stripe areas from images of 107 individual Amur tigers, achieving a high average classification accuracy of 97.3%. By introducing a dropout layer and a dual-attention mechanism, we enhance the InceptionResNetV2 model to better capture the stripe features of individual tigers at various granularities and reduce overfitting during training. Experimental results demonstrate that our model outperforms other classic models, offering optimal recognition accuracy and ideal loss changes. The average recognition accuracy for different body part features is 95.36%, with left stripes achieving a peak accuracy of 99.37%. These results highlight the model's excellent recognition capabilities. Our research provides a valuable and practical approach to the individual identification of rare and endangered animals, offering significant potential for improving conservation efforts. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. Continuous Wavelet Transform Peak-Seeking Attention Mechanism Conventional Neural Network: A Lightweight Feature Extraction Network with Attention Mechanism Based on the Continuous Wave Transform Peak-Seeking Method for Aero-Engine Hot Jet Fourier Transform Infrared Classification

Author: Du, Shuhan, Han, Wei, Kang, Zhenping, Lu, Xiangning, Liao, Yurong, and Li, Zhaoming
Subjects: FOURIER transform spectrometers, WAVELET transforms, FEATURE extraction, DEEP learning, WAVENUMBER, RUNNING speed
Abstract: Focusing on the problem of identifying and classifying aero-engine models, this paper measures the infrared spectrum data of aero-engine hot jets using a telemetry Fourier transform infrared spectrometer. Simultaneously, infrared spectral data sets with the six different types of aero-engines were created. For the purpose of classifying and identifying infrared spectral data, a CNN architecture based on the continuous wavelet transform peak-seeking attention mechanism (CWT-AM-CNN) is suggested. This method calculates the peak value of middle wave band by continuous wavelet transform, and the peak data are extracted by the statistics of the wave number locations with high frequency. The attention mechanism was used for the peak data, and the attention mechanism was weighted to the feature map of the feature extraction block. The training set, validation set and prediction set were divided in the ratio of 8:1:1 for the infrared spectral data sets. For three different data sets, the CWT-AM-CNN proposed in this paper was compared with the classical classifier algorithm based on CO2 feature vector and the popular AE, RNN and LSTM spectral processing networks. The prediction accuracy of the proposed algorithm in the three data sets was as high as 97%, and the lightweight network structure design not only guarantees high precision, but also has a fast running speed, which can realize the rapid and high-precision classification of the infrared spectral data of the aero-engine hot jets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Short-Term Power Load Forecasting Based on Secondary Cleaning and CNN-BILSTM-Attention.

Author: Wang, Di, Li, Sha, and Fu, Xiaojin
Subjects: CONVOLUTIONAL neural networks, OPTIMIZATION algorithms, DATA scrubbing, DUNG beetles, K-means clustering
Abstract: Accurate power load forecasting can provide crucial insights for power system scheduling and energy planning. In this paper, to address the problem of low accuracy of power load prediction, we propose a method that combines secondary data cleaning and adaptive variational mode decomposition (VMD), convolutional neural networks (CNN), bi-directional long short-term memory (BILSTM), and adding attention mechanism (AM). The Inner Mongolia electricity load data were first cleaned use the K-means algorithm, and then further refined with the density-based spatial clustering of applications with the noise (DBSCAN) algorithm. Subsequently, the parameters of the VMD algorithm were optimized using a multi-strategy Cubic-T dung beetle optimization algorithm (CTDBO), after which the VMD algorithm was employed to decompose the twice-cleaned load sequences into a number of intrinsic mode functions (IMFs) with different frequencies. These IMFs were then used as inputs to the CNN-BILSTM-Attention model. In this model, a CNN is used for feature extraction, BILSTM for extracting information from the load sequence, and AM for assigning different weights to different features to optimize the prediction results. It is proved experimentally that the model proposed in this paper achieves the highest prediction accuracy and robustness compared to other models and exhibits high stability across different time periods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Research on cuttings image segmentation method based on improved MultiRes-Unet++ with attention mechanism.

Author: Huo, Fengcai, Liu, Kaiming, Dong, Hongli, Ren, Weijian, and Dong, Shuai
Abstract: Cuttings logging is an important technology in petroleum exploration and production. It can be used to identify rock types, oil and gas properties, and reservoir features. However, the cuttings collected during cuttings logging are often small and few. Meanwhile, the surface color of cuttings is dark and the boundary is fuzzy. Traditional image segmentation methods have low accuracy. So it is difficult to identify and classify cuttings. Therefore, it is important to improve the accuracy of cuttings image segmentation. A deep learning-based cuttings image segmentation method is proposed in this paper. Firstly, the MultiRes module concept based on the UNet++ segmentation model is introduced in this paper, which proposes an improved end-to-end UNet++ image semantic segmentation model (called MultiRes-UNet++). Secondly, batch normalization into the input part of each layer's feature convolution layer is introduced too. Finally, a convolutional attention mechanism in the improved MultiRes-UNet++ segmentation model is introduced. Experimental results show that the accuracy between the segmentation results and the original image labels is 0.8791, the dice coefficient value is 0.8785, and the intersection over union is 0.7833. Compared with existing neural network segmentation algorithms, the performance is improved by about 5%. Compared with the algorithm before the fusion of the attention mechanism, the training speed is increased by about 75.2%. Our method can provide auxiliary information for cuttings logging. It is also of great significance for subsequent rock identification and classification. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. An Attention-Based Approach to Enhance the Detection and Classification of Android Malware.

Author: Ghourabi, Abdallah
Subjects: FEATURE selection, MACHINE learning, CLASSIFICATION algorithms, MALWARE, RESEARCH personnel
Abstract: The dominance of Android in the global mobile market and the open development characteristics of this platform have resulted in a significant increase in malware. These malicious applications have become a serious concern to the security of Android systems. To address this problem, researchers have proposed several machine-learning models to detect and classify Android malware based on analyzing features extracted from Android samples. However, most existing studies have focused on the classification task and overlooked the feature selection process, which is crucial to reduce the training time and maintain or improve the classification results. The current paper proposes a new Android malware detection and classification approach that identifies the most important features to improve classification performance and reduce training time. The proposed approach consists of two main steps. First, a feature selection method based on the Attention mechanism is used to select the most important features. Then, an optimized Light Gradient Boosting Machine (LightGBM) classifier is applied to classify the Android samples and identify the malware. The feature selection method proposed in this paper is to integrate an Attention layer into a multilayer perceptron neural network. The role of the Attention layer is to compute the weighted values of each feature based on its importance for the classification process. Experimental evaluation of the approach has shown that combining the Attention-based technique with an optimized classification algorithm for Android malware detection has improved the accuracy from 98.64% to 98.71% while reducing the training time from 80 to 28 s. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. Design of a Leaf-Bottom Pest Control Robot with Adaptive Chassis and Adjustable Selective Nozzle.

Author: Li, Dongshen, Gao, Fei, Li, Zemin, Zhang, Yutong, Gao, Chuang, and Li, Hongbo
Subjects: PEST control, STREAMING video & television, ADAPTIVE testing, POLLUTION, AGRICULTURAL productivity
Abstract: Pest control is an important guarantee for agricultural production. Pests are mostly light-avoiding and often gather on the bottom of crop leaves. However, spraying agricultural machinery mostly adopts top-down spraying, which suffers from low pesticide utilization and poor insect removal effect. Therefore, the upward spraying mode and intelligent nozzle have gradually become the research hotspot of precision agriculture. This paper designs a leaf-bottom pest control robot with adaptive chassis and adjustable selective nozzle. Firstly, the adaptive chassis is designed based on the MacPherson suspension, which uses shock absorption to drive the track to swing within a 30° angle. Secondly, a new type of cone angle adjustable selective nozzle was developed, which achieves adaptive selective precision spraying under visual guidance. Then, based on a convolutional block attention module (CBAM), the multi-CBAM-YOLOv5s network model was improved to achieve a 70% recognition rate of leaf-bottom spotted bad point in video streams. Finally, functional tests of the adaptive chassis and the adjustable selective spraying system were conducted. The data indicate that the adaptive chassis can adapt to diverse single-ridge requirements of soybeans and corn while protecting the ridge slopes. The selective spraying system achieves 70% precision in pesticide application, greatly reducing the use of pesticides. The scheme explores a ridge-friendly leaf-bottom pest control plan, providing a technical reference for improving spraying effect, reducing pesticide usage, and mitigating environmental pollution. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Facial Expression Recognition Based on Multiscale Features and Attention Mechanism.

Author: Lisha Yao
Abstract: Facial features extracted from deep convolutional networks are susceptible to background, individual identity and other factors. It interferes with facial expression recognition when mixed with useless features. Considering that different scale features have rich semantic and texture information respectively, this paper takes VGG-16 as the basic network structure and combines multiscale features to obtain richer feature information. In addition, the input feature map elements are enhanced or suppressed by the attention module in order to extract salient features more accurately. The proposed method was validated on two commonly used expression data sets CK+ and RAF-DB, and the recognition rates were 98.77 and 82.83%, respectively. Experimental results show the superiority of this method. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. OA-Net: outlier weakening and adaptive voxel encoding-based 3d object detection network.

Author: Wang, Chuanxu, Qin, Jianwei, and Fu, Xiaoshan
Abstract: This paper focuses on the adverse impact of outlier points and the ambiguity of candidate localizations in 3D object detection in terms of point cloud dataset. First, outlier points can disperse real feature extracting and mislead object detection, we propose an outlier weakening strategy. The neighborhood points of each point in the point set can be established via multi-directional search algorithm, and the correlations among points in the neighborhood are figured out via self-attention mechanism, then each point representation can be enhanced with the key information from its neighborhood, thus the negative impact of outlier points will be weakened due to obtaining real knowledge of object from neighborhood context. Second, multiple proposed boxes for object localization usually containing the same sampling points, this causes vagueness in differing them from each other and leads to incorrect object positioning. This paper proposes a voxel coding strategy with adaptive pooling, the candidate boxes are divided into voxels, and each voxel is further divided into multiple columns, then they are weighted and aggregated according to the importance of each column, thus can pop out the most confident spatial voxel encodings as reliable object localization nominees. This algorithm achieves an average accuracy of 82.98% and 93.2% on the KITTI dataset Car category and ModelNet40 dataset, respectively. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Accurate Recognition of Jujube Tree Trunks Based on Contrast Limited Adaptive Histogram Equalization Image Enhancement and Improved YOLOv8.

Author: Ling, Shunkang, Wang, Nianyi, Li, Jingbin, and Ding, Longpeng
Subjects: TREE trunks, IMAGE intensifiers, JUJUBE (Plant), HISTOGRAMS, DATA integrity, DATA quality
Abstract: The accurate recognition of tree trunks is a prerequisite for precision orchard yield estimation. Facing the practical problems of complex orchard environment and large data flow, the existing object detection schemes suffer from key issues such as poor data quality, low timeliness and accuracy, and weak generalization ability. In this paper, an improved YOLOv8 is designed on the basis of data flow screening and enhancement for lightweight jujube tree trunk accurate detection. Firstly, the key frame extraction algorithm was proposed and utilized to efficiently screen the effective data. Secondly, the CLAHE image data enhancement method was proposed and used to enhance the data quality. Finally, the backbone of the YOLOv8 model was replaced with a GhostNetv2 structure for lightweight transformation, also introducing the improved CA_H attention mechanism. Extensive comparison and ablation results show that the average precision of the quality-enhanced dataset over that of the original dataset increases from 81.2% to 90.1%, and the YOLOv8s-GhostNetv2-CA_H model proposed in this paper reduces the model size by 19.5% compared to that of the YOLOv8s base model, with precision increasing by 2.4% to 92.3%, recall increasing by 1.4%, mAP@0.5 increasing by 1.8%, and FPS being 17.1% faster. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Research on Mobile Phone Backplane Defect Segmentation Based on MDAF-UNet.

Author: Chen, Hao and Min, Byung-Won
Subjects: CELL phones, MANUFACTURING processes
Abstract: Mobile phone backplanes are an important part of mobile phones, and are often affected by a wide range of factors during the manufacturing process, resulting in defects of various scales and similar backgrounds. Therefore, accurately identifying these defects is crucial for improving mobile phone quality. To address this challenge, this paper proposes a multi-scale and dynamic attention fusion UNet (MDAF-UNet) model. The model innovatively combines normal convolution with dilated convolution. This allows the model to capture subtle features of defects and to perceive a larger range of feature variations. Moreover, an improved attention mechanism is introduced in this paper. It fuses channel attention and spatial attention, and dynamically adjusts the feature fusion strategy with learnable weights. This allows the model to increase the attention of important features and improve the effectiveness of feature representation. Experimental results on a publicly available dataset show that the MDAF-UNet model achieves 66.9% Mean Intersection over Union (MIoU), outperforming other state-of-the-art models. This result provides an effective solution to the mobile phone backplane defect segmentation problem. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. An Effective Image Classification Method for Plant Diseases with Improved Channel Attention Mechanism aECAnet Based on Deep Learning.

Author: Yang, Wenqiang, Yuan, Ying, Zhang, Donghua, Zheng, Liyuan, and Nie, Fuquan
Subjects: DEEP learning, IMAGE recognition (Computer vision), CONVOLUTIONAL neural networks, PLANT classification, PLANT diseases, PLANT productivity, NOSOLOGY
Abstract: Since plant diseases occurring during the growth process are a significant factor leading to the decline in both yield and quality, the classification and detection of plant leaf diseases, followed by timely prevention and control measures, are crucial for safeguarding plant productivity and quality. As the traditional convolutional neural network structure cannot effectively recognize similar plant leaf diseases, in order to more accurately identify the diseases on plant leaves, this paper proposes an effective plant disease image recognition method aECA-ResNet34. This method is based on ResNet34, and in the first and the last layers of this network, respectively, we add this paper's improved aECAnet with the symmetric structure. aECA-ResNet34 is compared with different plant disease classification models on the peanut dataset constructed in this paper and the open-source PlantVillage dataset. The experimental results show that the aECA-ResNet34 model proposed in this paper has higher accuracy, better performance, and better robustness. The results show that the aECA-ResNet34 model proposed in this paper is able to recognize diseases of multiple plant leaves very accurately. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Magnetic resonance image segmentation of rectal tumors based on improved CycleGAN and U-Net models.

Author: Li, Kefan, Qi, Baozhu, and Wang, Mingjia
Abstract: Accurate segmentation of rectal tumor lesion regions can provide an essential basis for clinical treatment and prognosis monitoring of tumors. However, there are many problems in the rectal tumor segmentation task at present: the lack of high-quality datasets; the mainstream segmentation network cannot complete the high-precision segmentation task of rectal tumors. In this paper, we investigate image enhancement and segmentation algorithms for convolutional neural networks and construct a rectal tumor MRI dataset by improving the CycleGAN network and loss function to achieve domain migration and reconstruction of rectal tumor CT and MRI images, given the small amount of rectal tumor image data and the existence of different modalities and regimes in CT and MRI. For the rectal tumor segmentation problem, a novel segmentation network DCMSG-UNet was designed based on the U-Net network. this network uses dilated convolution and multi-headed self-attention mechanisms to improve the base feature extraction module of the segmentation network, adds a decoder path, and uses the GAM hybrid attention mechanism to amplify the dimensional interaction features of the additional decoder path. Comparison experiments with six network models, DeepLabV3+, U-Net, UNet++, UNet3+, TransUNet, and Swin-Unet, show that the segmented region obtained from the DCMSG-UNet model proposed in this paper is closer to the real tumor region, with a DICE metric of 0.8416 and a Hausdorff distance of 11.3229, which can effectively segment the tumor. The experimental results show that our proposed method performs significantly better than the above methods, with a DICE metrics improvement of about 6%. To visualize the segmentation results, this paper designed a rectal tumor MRI image segmentation system based on PyQt5 to realize human-computer interaction and assist doctors in clinical diagnosis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Image splicing manipulation location by multi-scale dual-channel supervision.

Author: Hu, Jingyun, Xue, Ru, Teng, Guofeng, Niu, Shiming, and Jin, Danyang
Abstract: The swift growth of diverse editing software has resulted in image splicing manipulation becoming more complex, the discovery of a meticulously crafted splice forgery in digital images poses a significant challenge for both humans and machines. Existing image splicing manipulation detection algorithms have low localization accuracy and poor detection of small manipulation areas. In this paper, we proposed an end-to-end effective image manipulation location method based on a multi-scale and dual-channel model, MD_Unet. First, a dual-channel encoding network model is constructed. Adding a high-pass filtering branch containing SRM filters and Gabor filters at the input of the model and helps it to learn the manipulation traces of the image. Secondly, the dual-channel features are fused using an improved multi-scale pyramid pooling module. Then, Squeeze-Excitation is introduced to recalibrate the fused features so that the network pays more attention to splicing manipulation-related features. Finally, the fused feature map is input to the decoder, and the predicted image is decoded layer by layer to segment the manipulation region. We have performed extensive experimental validation and powerfully demonstrate the efficacy of the proposed approach. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. IRBEVF-Q: Optimization of Image–Radar Fusion Algorithm Based on Bird's Eye View Features.

Author: Cai, Ganlin, Chen, Feng, and Guo, Ente
Subjects: OBJECT recognition (Computer vision), ALGORITHMS, VIDEO coding, AUTONOMOUS vehicles, CAMERAS, PROBLEM solving
Abstract: In autonomous driving, the fusion of multiple sensors is considered essential to improve the accuracy and safety of 3D object detection. Currently, a fusion scheme combining low-cost cameras with highly robust radars can counteract the performance degradation caused by harsh environments. In this paper, we propose the IRBEVF-Q model, which mainly consists of BEV (Bird's Eye View) fusion coding module and an object decoder module.The BEV fusion coding module solves the problem of unified representation of different modal information by fusing the image and radar features through 3D spatial reference points as a medium. The query in the object decoder, as a core component, plays an important role in detection. In this paper, Heat Map-Guided Query Initialization (HGQI) and Dynamic Position Encoding (DPE) are proposed in query construction to increase the a priori information of the query. The Auxiliary Noise Query (ANQ) then helps to stabilize the matching. The experimental results demonstrate that the proposed fusion model IRBEVF-Q achieves an NDS of 0.575 and a mAP of 0.476 on the nuScenes test set. Compared to recent state-of-the-art methods, our model shows significant advantages, thus indicating that our approach contributes to improving detection accuracy. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. FMCW Radar Human Action Recognition Based on Asymmetric Convolutional Residual Blocks.

Author: Zhang, Yuan, Tang, Haotian, Wu, Ye, Wang, Bolun, and Yang, Dalin
Subjects: HUMAN activity recognition, DEEP learning, FEATURE extraction, RADAR, MACHINE learning, MULTISPECTRAL imaging
Abstract: Human action recognition based on optical and infrared video data is greatly affected by the environment, and feature extraction in traditional machine learning classification methods is complex; therefore, this paper proposes a method for human action recognition using Frequency Modulated Continuous Wave (FMCW) radar based on an asymmetric convolutional residual network. First, the radar echo data are analyzed and processed to extract the micro-Doppler time domain spectrograms of different actions. Second, a strategy combining asymmetric convolution and the Mish activation function is adopted in the residual block of the ResNet18 network to address the limitations of linear and nonlinear transformations in the residual block for micro-Doppler spectrum recognition. This approach aims to enhance the network's ability to learn features effectively. Finally, the Improved Convolutional Block Attention Module (ICBAM) is integrated into the residual block to enhance the model's attention and comprehension of input data. The experimental results demonstrate that the proposed method achieves a high accuracy of 98.28% in action recognition and classification within complex scenes, surpassing classic deep learning approaches. Moreover, this method significantly improves the recognition accuracy for actions with similar micro-Doppler features and demonstrates excellent anti-noise recognition performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. An Algorithm for Ship Detection in Complex Observation Scenarios Based on Mooring Buoys.

Author: Li, Wenbo, Ning, Chunlin, Fang, Yue, Yuan, Guozheng, Zhou, Peng, and Li, Chao
Subjects: CONVOLUTIONAL neural networks, OBJECT recognition (Computer vision), COLLISIONS at sea, SPATIAL resolution, BUOYS
Abstract: Marine anchor buoys, as fixed-point profile observation platforms, are highly susceptible to the threat of ship collisions. Installing cameras on buoys can effectively monitor and collect evidence from ships. However, when using a camera to capture images, it is often affected by the continuous shaking of buoys and rainy and foggy weather, resulting in problems such as blurred images and rain and fog occlusion. To address these problems, this paper proposes an improved YOLOv8 algorithm. Firstly, the polarized self-attention (PSA) mechanism is introduced to preserve the high-resolution features of the original deep convolutional neural network and solve the problem of image spatial resolution degradation caused by shaking. Secondly, by introducing the multi-head self-attention (MHSA) mechanism in the neck network, the interference of rain and fog background is weakened, and the feature fusion ability of the network is improved. Finally, in the head network, this model combines additional small object detection heads to improve the accuracy of small object detection. Additionally, to enhance the algorithm's adaptability to camera detection scenarios, this paper simulates scenarios, including shaking blur, rain, and foggy conditions. In the end, numerous comparative experiments on a self-made dataset show that the algorithm proposed in this study achieved 94.2% mAP50 and 73.2% mAP50:95 in various complex environments, which is superior to other advanced object detection algorithms. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. A Multi-Scale Natural Scene Text Detection Method Based on Attention Feature Extraction and Cascade Feature Fusion.

Author: Li, Nianfeng, Wang, Zhenyan, Huang, Yongyuan, Tian, Jia, Li, Xinyuan, and Xiao, Zhiguo
Subjects: TEXT recognition, FEATURE extraction, COMPUTER vision, VISUAL fields, DEEP learning
Abstract: Scene text detection is an important research field in computer vision, playing a crucial role in various application scenarios. However, existing scene text detection methods often fail to achieve satisfactory results when faced with text instances of different sizes, shapes, and complex backgrounds. To address the challenge of detecting diverse texts in natural scenes, this paper proposes a multi-scale natural scene text detection method based on attention feature extraction and cascaded feature fusion. This method combines global and local attention through an improved attention feature fusion module (DSAF) to capture text features of different scales, enhancing the network's perception of text regions and improving its feature extraction capabilities. Simultaneously, an improved cascaded feature fusion module (PFFM) is used to fully integrate the extracted feature maps, expanding the receptive field of features and enriching the expressive ability of the feature maps. Finally, to address the cascaded feature maps, a lightweight subspace attention module (SAM) is introduced to partition the concatenated feature maps into several sub-space feature maps, facilitating spatial information interaction among features of different scales. In this paper, comparative experiments are conducted on the ICDAR2015, Total-Text, and MSRA-TD500 datasets, and comparisons are made with some existing scene text detection methods. The results show that the proposed method achieves good performance in terms of accuracy, recall, and F-score, thus verifying its effectiveness and practicality. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. CAW-YOLO: Cross-Layer Fusion and Weighted Receptive Field-Based YOLO for Small Object Detection in Remote Sensing.

Author: Weiya Shi, Shaowen Zhang, and Shiqiang Zhang
Subjects: OBJECT recognition (Computer vision), REMOTE sensing, OPTICAL remote sensing, CONVOLUTIONAL neural networks, DISCRETE cosine transforms
Abstract: In recent years, there has been extensive research on object detection methods applied to optical remote sensing images utilizing convolutional neural networks. Despite these efforts, the detection of small objects in remote sensing remains a formidable challenge. The deep network structure will bring about the loss of object features, resulting in the loss of object features and the near elimination of some subtle features associated with small objects in deep layers. Additionally, the features of small objects are susceptible to interference from background features contained within the image, leading to a decline in detection accuracy. Moreover, the sensitivity of small objects to the bounding box perturbation further increases the detection difficulty. In this paper, we introduce a novel approach, Cross-Layer Fusion and Weighted Receptive Field-based YOLO (CAW-YOLO), specifically designed for small object detection in remote sensing. To address feature loss in deep layers, we have devised a cross-layer attention fusion module. Background noise is effectively filtered through the incorporation of Bi-Level Routing Attention (BRA). To enhance the model's capacity to perceive multi-scale objects, particularly small-scale objects, we introduce a weightedmulti-receptive field atrous spatial pyramid poolingmodule. Furthermore, wemitigate the sensitivity arising from bounding box perturbation by incorporating the joint Normalized Wasserstein Distance (NWD) and Efficient Intersection over Union (EIoU) losses. The efficacy of the proposedmodel in detecting small objects in remote sensing has been validated through experiments conducted on three publicly available datasets. The experimental results unequivocally demonstrate the model's pronounced advantages in small object detection for remote sensing, surpassing the performance of current mainstream models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Unsupervised social network embedding via adaptive specific mappings.

Author: Ge, Youming, Huang, Cong, Liu, Yubao, Zhang, Sen, and Kong, Weiyang
Abstract: In this paper, we address the problem of unsuperised social network embedding, which aims to embed network nodes, including node attributes, into a latent low dimensional space. In recent methods, the fusion mechanism of node attributes and network structure has been proposed for the problem and achieved impressive prediction performance. However, the non-linear property of node attributes and network structure is not efficiently fused in existing methods, which is potentially helpful in learning a better network embedding. To this end, in this paper, we propose a novel model called ASM (Adaptive Specific Mapping) based on encoder-decoder framework. In encoder, we use the kernel mapping to capture the non-linear property of both node attributes and network structure. In particular, we adopt two feature mapping functions, namely an untrainable function for node attributes and a trainable function for network structure. By the mapping functions, we obtain the low dimensional feature vectors for node attributes and network structure, respectively. Then, we design an attention layer to combine the learning of both feature vectors and adaptively learn the node embedding. In encoder, we adopt the component of reconstruction for the training process of learning node attributes and network structure. We conducted a set of experiments on seven real-world social network datasets. The experimental results verify the effectiveness and efficiency of our method in comparison with state-of-the-art baselines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Early fire detection technology based on improved transformers in aircraft cargo compartments.

Author: Hong-zhou Ai, Dong Han, Xin-zhi Wang, Quan-yi Liu, Yue Wang, Meng-yue Li, and Pei Zhu
Subjects: FIRE detectors, ELECTRIC transformers, DEEP learning, RECURRENT neural networks, ARTIFICIAL neural networks
Abstract: The implementation of early and accurate detection of aircraft cargo compartment fire is of great significance to ensure flight safety. The current airborne fire detection technology mostly relies on single-parameter smoke detection using infrared light. This often results in a high false alarm rate in complex air transportation environments. The traditional deep learning model struggles to effectively address the issue of long-term dependency in multivariate fire information. This paper proposes a multi-technology collaborative fire detection method based on an improved transformers model. Dual-wavelength optical sensors, flue gas analyzers, and other equipment are used to carry out multi-technology collaborative detection methods and characterize various feature dimensions of fire to improve detection accuracy. The improved Transformer model which integrates the self-attention mechanism and position encoding mechanism is applied to the problem of long-time series modeling of fire information from a global perspective, which effectively solves the problem of gradient disappearance and gradient explosion in traditional RNN (recurrent neural network) and CNN (convolutional neural network). Two different multi-head self-attention mechanisms are used to classify and model multivariate fire information, respectively, which solves the problem of confusing time series modeling and classification modeling in dealing with multivariate classification tasks by a single attention mechanism. Finally, the output results of the two models are fused through the gate mechanism. The research results show that, compared with the traditional single-feature detection technology, the multi-technology collaborative fire detection method can better capture fire information. Compared with the traditional deep learning model, the multivariate fire prediction model constructed by the improved Transformer can better detect fires, and the accuracy rate is 0.995. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. Cascaded adaptive global localisation network for steel defect detection.

Author: Yu, Jianbo, Wang, Yanshu, Li, Qingfeng, Li, Hao, Ma, Mingyan, and Liu, Peilun
Subjects: ARTIFICIAL neural networks, CONVOLUTIONAL neural networks, GAUSSIAN mixture models, STEEL, MANUFACTURING processes
Abstract: Defect detection is crucial in ensuring the quality of steel products. This paper proposes a novel deep neural network, cascaded adaptive global location network (CAGLNet), for detecting steel surface defects. The main objective of this study is to address the challenges associated with the irregular shape and dense spatial distribution of defects on steel. To achieve this goal, CAGLNet integrates a feature extraction network that combines residual and feature pyramid networks, a cascade adaptive tree-structure region proposal network (CAT-RPN) that eliminates the need for prior knowledge, and a global localisation regression for steel defect detection. This paper evaluates the effectiveness of CAGLNet on the NEU-DET dataset and demonstrates that the proposed model achieves an average accuracy of 85.40% with a fast frames per second of 10.06, outperforming those state-of-the-art methods. These results suggest that CAGLNet has the potential to significantly improve the effectiveness of defect detection in industrial production processes, leading to increased production yield and cost savings. Abbreviations: AT-RPN, adaptive tree-structure region proposal network; CAGLNet, cascaded adaptive global location network; CAT-RPN, cascade adaptive tree-structure region proposal network; CNN, convolutional neural network; DNN, deep neural network; EPNet, edge proposal network; FPN, feature pyramid network; FCOS, fully convolutional one-stage detector; FPS, frames per second; GMM, Gaussian mixture model; IoU, intersection-over-union; ROIAlign, region of interest align; RPN, region proposal network; ResNet, residual network; ResNet50_FPN, residual network and feature pyramid network; SABL, side aware boundary localisation; SSD, single-shot multiBox detector; TPE, Tree-structured Parzen estimator [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. An ultra-short-term forecasting method for multivariate loads of user-level integrated energy systems in a microscopic perspective: based on multi-energy spatio-temporal coupling and dual-attention mechanism.

Author: Yin, Xiucheng, Gao, Zhengzhong, Cheng, Yumeng, Hao, Yican, You, Zhenhuan, Zeng, Xiangjun, Zhang, Xuran, and Jin, Zongshuai
Subjects: CONVOLUTIONAL neural networks, FEATURE extraction, FORECASTING, ENERGY consumption, DEEP learning
Abstract: An ultra-short-term multivariate load forecasting method under a microscopic perspective is proposed to address the characteristics of user-level integrated energy systems (UIES), which are small in scale and have large load fluctuations. Firstly, the spatio-temporal correlation of users' energy use behavior within the UIES is analyzed, and a multivariate load input feature set in the form of a class image is constructed based on the various types of load units. Secondly, in order to maintain the feature independence and temporal integrity of each load during the feature extraction process, a deep neural network architecture with spatiotemporal coupling characteristics is designed. Among them, the multi-channel parallel convolutional neural network (MCNN) performs independent spatial feature extraction of the 2D load component pixel images at each moment in time, and feature fusion of various types of load features in high dimensional space. A bidirectional long short-term memory network (BiLSTM) is used as a feature sharing layer to perform temporal feature extraction on the fused load sequences. In addition, a spatial attention layer and a temporal attention layer are designed in this paper for the original input load pixel images and the fused load sequences, respectively, so that the model can better capture the important information. Finally, a multi-task learning approach based on the hard sharing mechanism achieves joint prediction of each load. The measured load data of a UIES is analyzed as an example to verify the superiority of the method proposed in this paper. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. CNA-DeepSORT algorithm for multi-target tracking.

Author: Feng, Kaili, Huo, Wenxiao, Xu, Wenhao, Li, Meng, and Li, Tianping
Abstract: In recent years, multi-target tracking algorithms have been developed rapidly. However, in multi-target tracking, mutual occlusion and cross between targets and sudden disappearance and reappearance of targets in videos can easily occur, which could only result in missed detection, false detection, and wrong ID switching. To address the above problems, the CenterNet attention DeepSORT algorithm (CNA-DeepSORT) proposed in this paper incorporates a CenterNet network with channel attention mechanism in the original detection part of the DeepSORT algorithm instead of Faster R-CNN, and designs a multi-scale feature extraction module with the DeepSORT algorithm in the multi-scale feature extraction module and designed a pedestrian recognition network combined with the DeepSORT algorithm. These improvements lead to a 3.7% improvement in MOTA metric, 1.6% improvement in MOTP metric, 238 fewer false ID switches, 2627 fewer FP metrics, 3943 fewer FN metrics, a decrease in run speed, and a 4 Hz reduction in frame rate compared to the original DeepSORT algorithm. improved by 3. 7, and there is some improvement in handling the occlusion problem of multi-target tracking, and the false and missed detection of targets during ID switching is reduced. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Research on prediction method of expressway section traffic flow considering anomaly detection probability.

Author: Wang, Zhiyu, Li, Linheng, Qu, Xu, Mao, Peipei, and Ran, Bin
Subjects: TRAFFIC flow, ANOMALY detection (Computer security), TRAFFIC congestion, EXPRESS highways, RESEARCH methodology, RECURRENT neural networks
Abstract: Real‐time and accurate short‐term traffic flow prediction can provide a scientific basis for decision making by travellers and traffic management, and alleviate traffic congestion to a certain extent. The existing traffic flow prediction methods often encounter limitations in real‐time performance and accuracy due to the post‐processing required to rectify anomaly detection results during data pre‐processing. This paper presents a novel traffic flow prediction model, termed the With‐Anomaly Detection Probability (ADP) Attention‐Bidirectional Long‐Short Term Memory (BiLSTM) model. This model takes the probability of anomaly detection into consideration, integrating the anomaly detection outcomes as an inherent parameter into the traffic flow prediction framework. Additionally, the model incorporates an attention mechanism within the long‐short term memory network. Through a comprehensive simulation study utilizing actual measured traffic flow data from the Shanghai‐Chongqing Expressway, the effectiveness of the proposed model is rigorously evaluated. The prediction results show that the model proposed in this paper is a real‐time and accurate short‐time traffic flow prediction model compared with the basic models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Research on heterogeneous multi-UAV collaborative decision-making method based on improved PPO.

Author: Xu, Lin, Zhang, Xinmiao, Xiao, Dong, Liu, Beihong, and Liu, Aixue
Subjects: DEEP reinforcement learning, REINFORCEMENT learning, PROBLEM solving, ENTROPY, DECISION making
Abstract: In order to solve the problem that the Proximal Policy Optimization (PPO) algorithm is difficult to converge in the air-sea battle scenarios with high dynamics, strong interference, and complex state space, the Ray-LAPPO algorithm based on Long Short-Term Memory (LSTM) and Attention mechanism is proposed in this paper under the distributed training framework Ray. Firstly, the idea of Centralized Training Distributed Execution (CTDE) is adopted to extend the PPO algorithm to the field of multi-agent and the policy entropy is added to the loss function to encourage the exploration of agents; Secondly, the LSTM network is added to the actor and critic networks to explore the timing relationship between non-independent and identically distributed samples and improve the learning performance of the UAV; In addition, the Attention mechanism is introduced to obtain the states at different time steps and establish a weighted differentiation model of the final value function; Finally, the simulation experiments on the self-developed heterogeneous UAV collaborative decision-making environment show that Ray-LAPPO can get the most advanced performance in different scenarios, and also possesses potential value for large-scale real-world applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

2,750 results

Search Results

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources