Descriptor: "Self-supervised learning" / Publication Year Range: Last 3 years - Searchworks@Jio Institute Digital Library Search Results

Your search keyword '"Self-supervised learning"' showing total 3,718 results

Start Over Descriptor "Self-supervised learning" Publication Year Range Last 3 years

3,718 results on '"Self-supervised learning"'

1. SMAR: self-supervised mobile application recommendation based on graph convolutional networks

Author: Fu, Zhongxiang, Cao, Buqing, Liu, Shanpeng, Peng, Qian, Peng, Zhenlian, Shi, Min, and Liu, Shangli
Published: 2024
Full Text: View/download PDF

2. Dynamic graph attention-guided graph clustering with entropy minimization self-supervision.

Author: Zhu, Ran, Peng, Jian, Huang, Wen, He, Yujun, and Tang, Chengyi
Subjects: GRAPH neural networks, ENTROPY, FUZZY algorithms, DYNAMIC models
Abstract: Graph clustering is one of the most fundamental tasks in graph learning. Recently, numerous graph clustering models based on dual network (Auto-encoder+Graph Neural Network(GNN)) architectures have emerged and achieved promising results. However, we observe several limitations in the literature: 1) simple graph neural networks that fail to capture the intricate relationships between nodes are used for graph clustering tasks; 2) heterogeneous information is inadequately interacted and merged; and 3) the clustering boundaries are fuzzy in the feature space. To address the aforementioned issues, we propose a novel graph clustering model named Dynamic Graph Attention-guided Graph Clustering with Entropy Minimization self-supervision(DGAGC-EM). Specifically, we introduce DGATE, a graph auto-encoder based on dynamic graph attention, to capture the intricate relationships among graph nodes. Additionally, we perform feature enhancement from both global and local perspectives via the proposed Global-Local Feature Enhancement (GLFE) module. Finally, we propose a self-supervised strategy based on entropy minimization theory to guide network training process to achieve better performance and produce sharper clustering boundaries. Extensive experimental results obtained on four datasets demonstrate that our method is highly competitive with the SOTA methods. The figure presents the overall framework of proposed Dynamic Graph Attention-guided Graph Clustering with Entropy Minimization selfsupervision(DGAGC-EM). Specifically, the Dynamic Graph Attetion Auto-Encoder Module is our proposed graph auto-encoder based on dynamic graph attention, to capture the intricate relationships among graph nodes. The Auto-Encoder Module is a basic autoencoder with simple MLPs to extract embeddings from node attributes. Additionally, the proposed Global-Local Feature Enhancement (GLFE) module perform feature enhancement from both global and local perspectives. Finally, the proposed Self-supervised Module guide network training process to achieve better performance and produce sharper clustering boundaries [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

3. Self‐supervised learning improves robustness of deep learning lung tumor segmentation models to CT imaging differences.

Author: Jiang, Jue, Rangnekar, Aneesh, and Veeraraghavan, Harini
Subjects: *CONVOLUTIONAL neural networks, *TRANSFORMER models, *COMPUTED tomography, *DEEP learning, *LUNG cancer, *LUNGS
Abstract: Background Purpose Methods Results Conclusion Self‐supervised learning (SSL) is an approach to extract useful feature representations from unlabeled data, and enable fine‐tuning on downstream tasks with limited labeled examples. Self‐pretraining is a SSL approach that uses curated downstream task dataset for both pretraining and fine‐tuning. Availability of large, diverse, and uncurated public medical image sets presents the opportunity to potentially create foundation models by applying SSL in the “wild” that are robust to imaging variations. However, the benefit of wild‐ versus self‐pretraining has not been studied for medical image analysis.Compare robustness of wild versus self‐pretrained models created using convolutional neural network (CNN) and transformer (vision transformer [ViT] and hierarchical shifted window [Swin]) models for non‐small cell lung cancer (NSCLC) segmentation from 3D computed tomography (CT) scans.CNN, ViT, and Swin models were wild‐pretrained using unlabeled 10,412 3D CTs sourced from the cancer imaging archive and internal datasets. Self‐pretraining was applied to same networks using a curated public downstream task dataset (n = 377) of patients with NSCLC. Pretext tasks introduced in self‐distilled masked image transformer were used for both pretraining approaches. All models were fine‐tuned to segment NSCLC (n = 377 training dataset) and tested on two separate datasets containing early (public n = 156) and advanced stage (internal n = 196) NSCLC. Models were evaluated in terms of: (a) accuracy, (b) robustness to image differences from contrast, slice thickness, and reconstruction kernels, and (c) impact of pretext tasks for pretraining. Feature reuse was evaluated using centered kernel alignment.Wild‐pretrained Swin models resulted in higher feature reuse at earlier level layers and increased feature differentiation close to output. Wild‐pretrained Swin outperformed self‐pretrained models for analyzed imaging acquisitions. Neither ViT nor CNN showed a clear benefit of wild‐pretraining compared to self‐pretraining. Masked image prediction pretext task that forces networks to learn the local structure resulted in higher accuracy compared to contrastive task that models global image information.Wild‐pretrained Swin networks were more robust to analyzed CT imaging differences for lung tumor segmentation than self‐pretrained methods. ViT and CNN models did not show a clear benefit for wild‐pretraining over self‐pretraining. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

4. Contrastive learning for real SAR image despeckling.

Author: Fang, Yangtian, Liu, Rui, Peng, Yini, Guan, Jianjun, Li, Duidui, and Tian, Xin
Subjects: *SPECKLE interference, *SYNTHETIC aperture radar, *SPECKLE interferometry, *WEATHER, *GENERALIZATION
Abstract: The use of synthetic aperture radar (SAR) has greatly improved our ability to capture high-resolution terrestrial images under various weather conditions. However, SAR imagery is affected by speckle noise, which distorts image details and hampers subsequent applications. Recent forays into supervised deep learning-based denoising methods, like MRDDANet and SAR-CAM, offer a promising avenue for SAR despeckling. However, they are impeded by the domain gaps between synthetic data and realistic SAR images. To tackle this problem, we introduce a self-supervised speckle-aware network to utilize the limited near-real datasets and unlimited synthetic datasets simultaneously, which boosts the performance of the downstream despeckling module by teaching the module to discriminate the domain gap of different datasets in the embedding space. Specifically, based on contrastive learning, the speckle-aware network first characterizes the discriminative representations of spatial-correlated speckle noise in different images across diverse datasets, which provides priors of versatile speckles and image characteristics. Then, the representations are effectively modulated into a subsequent multi-scale despeckling network to generate authentic despeckled images. In this way, the despeckling module can reconstruct reliable SAR image characteristics by learning from near-real datasets, while the generalization performance is guaranteed by learning abundant patterns from synthetic datasets simultaneously. Additionally, a novel excitation aggregation pooling module is inserted into the despeckling network to enhance the network further, which utilizes features from different levels of scales and better preserves spatial details around strong scatters in real SAR images. Extensive experiments across real SAR datasets from Sentinel-1, Capella-X, and TerraSAR-X satellites are carried out to verify the effectiveness of the proposed method over other state-of-the-art methods. Specifically, the proposed method achieves the best PSNR and SSIM values evaluated on the near-real Sentinel-1 dataset, with gains of 0.22 dB in PSNR compared to MRDDANet, and improvements of 1.3% in SSIM over SAR-CAM. The code is available at https://github.com/YangtianFang2002/CL-SAR-Despeckling. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

5. 融合 Transformer 与原型自监督的苹果叶部病害识别.

Author: 李大湘, 张雯凯, and 刘颖
Abstract: Apple leaf diseases (ALD) identification can be characterized by "significant intra-class variation and subtle interclass differences". In this study, an innovative model was presented to integrate transformer with prototype self-supervised (FTPSS) learning. This improved model aimed to significantly elevate the precision of ALD recognition, thereby enhancing disease management strategies in orchards. The ResNet50 was utilized as the backbone network in the FTPSS model. This robust architecture was employed to extract multi-level feature maps from ALD images, in order to capture the intricate details for accurate disease identification. An encoder design was also integrated a simplified self-attention (SSA) mechanism with spatial attention guided deformable convolution (SAG-DC). The simplified self-attention and deformable convolution transformer (SSADC-TF) was used to facilitate the effective interaction and fusion of multi-level feature maps. The extracted features were then processed. The sensitivity of model was enhanced for the irregular lesion areas within ALD images. SSADCTF was significantly distinguished among different disease manifestations. A prototype self-supervised (PSS) learning module was introduced to further verify the performance of model. Two self-supervised loss functions: "Orthogonality" and "Clustering" were selected in the module. In the "Orthogonality" loss, the feature representations of different ALD classes were orthogonal to each other. A clear separation among classes was promoted to enhance the identification of the model. Meanwhile, the "Clustering" loss was used to tighten the intra-class compactness, thus ensuring that the variations within the same class was suitable for the robustness of the model. Extensive experiments were conducted on both standard and real-world image datasets, indicating the remarkable effectiveness of FTPSS model. The FTPSS model was achieved in a recognition accuracy of 98.61% on the standard image set, indicating a significant improvement of 5.15 percentage points over the baseline model. Similarly, the FTPSS model was obtained an accuracy of 98.73% on the real-world image set, indicating an enhancement of 4.49 percentage points, compared with the baseline. These results underscored the robust performance of FTPSS model to identify ALD, even in the presence of significant intra-class variation and subtle inter-class differences. The FTPSS model was attributed to the innovative integration of Transformer with Prototype Self-Supervised learning. There were the powerful feature extraction of ResNet50. SSADC-TF was also enhanced feature interaction and fusion. The complex details in ALD images were captured to achieve in a 2.40 percentage point improvement. Furthermore, the PSS learning module was introduced to mitigate the semantic gap, where the model was generalized well to new, unseen ALD cases. The accuracy of ALD image recognition increased by 2.69 percentage points. In conclusion, the FTPSS model shared a significant advancement in ALD recognition, with the potential to revolutionize disease management strategies in orchards. The precise, timely information can be expected to apply into the automatic process of disease detection ALD, thereby preserving the health and productivity of the orchards. This finding can greatly contribute to the field of precision agriculture using advanced deep learning techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

6. Contrastive graph clustering via enhanced hard sample mining and cluster-guiding.

Author: Li, Meng, Yang, Bo, Xue, Tao, Zhang, Yiguo, and Zhou, Liangliang
Abstract: Contrastive graph clustering (CGC) has emerged as a research hotspot in current studies, aiming to leverage the robust representational capability of contrastive learning to improve graph clustering performance. Recent works have shown that CGC can benefit from hard sample mining. However, we observe two primary shortcomings of existing CGC methods that limit further enhancements in clustering performance. Firstly, the widely used contrastive loss mistakenly classifies elements outside the cross-view diagonal as negatives, yielding numerous false negatives. Secondly, without explicit cluster-guiding, learned node embeddings become unsuitable for clustering tasks. To address these issues, we propose a novel CGC method by Enhanced hard sample mining and cluster-guiding (CGCEC). This method generates high-confidence pseudo-labels by clustering node embeddings during network training. Furthermore, we have designed a hard sample debiased mining loss that uses pseudo-labels to remove the false negative samples, repelling hard negatives while attracting hard positives, thus enhancing the discriminability of the learned embeddings. Additionally, we employ the encoder to transform node embeddings into semantic labels, promoting the network to learn node embeddings more suitable to clustering by matching semantic labels with pseudo-labels. To validate CGCEC’s effectiveness, we compare it with state-of-the-art graph clustering methods across six benchmark datasets. The experimental results substantiate the efficacy of our method and its superiority over competing approaches. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

7. ConKeD: multiview contrastive descriptor learning for keypoint-based retinal image registration.

Author: Rivas-Villar, David, Hervella, Álvaro S., Rouco, José, and Novo, Jorge
Subjects: *ARTIFICIAL neural networks, *IMAGE registration, *RETINAL imaging, *LEARNING strategies, *MEDICAL practice, *DEEP learning
Abstract: Retinal image registration is of utmost importance due to its wide applications in medical practice. In this context, we propose ConKeD, a novel deep learning approach to learn descriptors for retinal image registration. In contrast to current registration methods, our approach employs a novel multi-positive multi-negative contrastive learning strategy that enables the utilization of additional information from the available training samples. This makes it possible to learn high-quality descriptors from limited training data. To train and evaluate ConKeD, we combine these descriptors with domain-specific keypoints, particularly blood vessel bifurcations and crossovers, that are detected using a deep neural network. Our experimental results demonstrate the benefits of the novel multi-positive multi-negative strategy, as it outperforms the widely used triplet loss technique (single-positive and single-negative) as well as the single-positive multi-negative alternative. Additionally, the combination of ConKeD with the domain-specific keypoints produces comparable results to the state-of-the-art methods for retinal image registration, while offering important advantages such as avoiding pre-processing, utilizing fewer training samples, and requiring fewer detected keypoints, among others. Therefore, ConKeD shows a promising potential towards facilitating the development and application of deep learning-based methods for retinal image registration. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

8. Vicsgaze: a gaze estimation method using self-supervised contrastive learning.

Author: Gu, De, Lv, Minghao, and Liu, Jianchu
Abstract: Existing deep learning-based gaze estimation methods achieved high accuracy, and the prerequisite for ensuring their performance is large-scale datasets with gaze labels. However, collecting large-scale gaze datasets is time-consuming and expensive. To this end, we propose VicsGaze, a self-supervised network that learns generalized gaze-aware representations without labeled data. We feed two gaze-specific augmentation views of the same face image into a multi-branch convolutional re-parameterization encoder to obtain feature representations. Although the two augmentation views make the origin face image present different appearances, the gaze direction they represent is consistent. We then map these two representations into an embedding space and employ a novel loss function to optimize model training. The experiments demonstrate that our VicsGaze performs outstanding cross-dataset gaze estimation on several datasets. Meanwhile, VicsGaze outperforms the baseline of supervised learning methods when fine-tuning with few calibration samples. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

9. Improved contrastive learning model via identification of false‐negatives in self‐supervised learning.

Author: Auh, Joonsun, Cho, Changsik, and Kim, Seon‐tae
Abstract: Self‐supervised learning is a method that learns the data representation through unlabeled data. It is efficient because it learns from large‐scale unlabeled data and through continuous research, performance comparable to supervised learning has been reached. Contrastive learning, a type of self‐supervised learning algorithm, utilizes data similarity to perform instance‐level learning within an embedding space. However, it suffers from the problem of false‐negatives, which are the misclassification of data class during training the data representation. They result in loss of information and deteriorate the performance of the model. This study employed cosine similarity and temperature simultaneously to identify false‐negatives and mitigate their impact to improve the performance of the contrastive learning model. The proposed method exhibited a performance improvement of up to 2.7% compared with the existing algorithm on the CIFAR‐100 dataset. Improved performance on other datasets such as CIFAR‐10 and ImageNet was also observed. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

10. A Novel Multi-Task Self-Supervised Transfer Learning Framework for Cross-Machine Rolling Bearing Fault Diagnosis.

Author: Zhao, Lujia, He, Yuling, Dai, Derui, Wang, Xiaolong, Bai, Honghua, and Huang, Weiling
Abstract: In recent years, intelligent methods based on transfer learning have achieved significant research results in the field of rolling bearing fault diagnosis. However, most studies focus on the transfer diagnosis scenario under different working conditions of the same machine. The transfer fault diagnosis methods used for different machines have problems such as low recognition accuracy and unstable performance. Therefore, a novel multi-task self-supervised transfer learning framework (MTSTLF) is proposed for cross-machine rolling bearing fault diagnosis. The proposed method is trained using a multi-task learning paradigm, which includes three self-supervised learning tasks and one fault diagnosis task. First, three different scales of masking methods are designed to generate masked vibration data based on the periodicity and intrinsic information of the rolling bearing vibration signals. Through self-supervised learning, the attention to the intrinsic features of data in different health conditions is enhanced, thereby improving the model's feature expression capability. Secondly, a multi-perspective feature transfer method is proposed for completing cross-machine fault diagnosis tasks. By integrating two types of metrics, probability distribution and geometric similarity, the method focuses on transferable fault diagnosis knowledge from different perspectives, thereby enhancing the transfer learning ability and accomplishing cross-machine fault diagnosis of rolling bearings. Two experimental cases are carried out to evaluate the effectiveness of the proposed method. Results suggest that the proposed method is effective for cross-machine rolling bearing fault diagnosis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

11. Abnormal Sound Detection of Wind Turbine Gearboxes Based on Improved MobileFaceNet and Feature Fusion.

Author: Liang, Yuelong, Liu, Haorui, and Chen, Yayu
Abstract: To solve problems such as the unstable detection performance of the sound anomaly detection of wind turbine gearboxes when only normal data are used for training, and the poor detection performance caused by the poor classification of samples with high similarity, this paper proposes a self-supervised wind turbine gearbox sound anomaly detection algorithm that fuses time-domain features and Mel spectrograms, improves the MobileFaceNet (MFN) model, and combines the Gaussian Mixture Model (GMM). This method compensates for the abnormal information lost in Mel spectrogram features through feature fusion and introduces a style attention mechanism (SRM) in MFN to enhance the expression of features, improving the accuracy and stability of the abnormal sound detection model. For the wind turbine gearbox sound dataset of a certain wind farm in Guangyuan, the average AUC of the sound data at five measuring point positions of the wind turbine gearbox using the method proposed in this paper, STgram-MFN-SRM, reached 96.16%. Compared with the traditional anomaly detection methods LogMel-MFN, STgram-MFN, STgram-Resnet50, and STgram-MFN-SRM(CE), the average AUC of sound detection at the five measuring point positions increased by 5.19%, 4.73%, 11.06%, and 2.88%, respectively. Therefore, the method proposed in this paper effectively improves the performance of the sound anomaly detection model of wind turbine gearboxes and has important engineering value for the healthy operation and maintenance of wind turbines. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

12. 基于图关系选择的深度聚类网络.

Author: 孙艳丰 and 杜鹏飞
Abstract: Copyright of Journal of Beijing University of Technology is the property of Journal of Beijing University of Technology, Editorial Department and its content may not be copied or emailed to multiple sites or posted to a listserv without the copyright holder's express written permission. However, users may print, download, or email articles for individual use. This abstract may be abridged. No warranty is given about the accuracy of the copy. Users should refer to the original published version of the material for the full abstract. (Copyright applies to all Abstracts.)
Published: 2024
Full Text: View/download PDF

13. Self-supervised change detection of heterogeneous images based on difference algorithms.

Author: Wu, Jinsha, Yang, Shuwen, Li, Yikun, Fu, Yukai, Shi, Zhuang, and Zheng, Yao
Abstract: The presence of heterogeneous image disparities often leads to inferior quality in the generated difference images during change detection. This paper proposes a self-supervised change detection of heterogeneous images based on a difference algorithm. Firstly, a combination of phase consistency and a simplified pulse-coupled neural network (PC-SPCNN) is used to fuse the heterogeneous images, and the result is used to compute the difference image (DI). The new DI generation method can generate the standard and exponential difference images. Secondly, the hierarchical FCM clustering algorithm is improved to extract stable and correct self-supervised samples by difference images so that the clustering process is not overly dependent on thresholds. Then, the support vector machine classifier is trained based on the heterogeneous images, the fused images, and self-supervised sample sets, and the information from the fused images is utilized to increase the feature dimension for better detection of changes. Finally, the support vector machine classifier automatically detects whether the intermediate pixels are changed and produces the change detection results. The experimental results confirm the improvements made by the proposed method in difference image extraction, training sample selection, and clustering algorithm, and the stability of the method exceeds that of the state-of-the-art change detection methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

14. A performance-interpretable intelligent fusion of sound and vibration signals for bearing fault diagnosis via dynamic CAME.

Author: Keshun, You, Zengwei, Lian, and Yingkui, Gu
Abstract: This study proposed a performance-interpretable deep learning model for rolling bearing fault diagnosis that integrates an intelligent fusion of sound and vibration signals and self-supervised learning via an interpretable attention mechanism. A deep learning decoder framework with a compressed attention mechanism encoder is developed to automatically learn the correlation between sound and vibration signals and the fusion method, eliminating the need for manual feature extraction and multi-model construction. By introducing the dynamic attention mechanism, the strength of the correlation between sound and vibration signals can be sensed in real-time to adapt to different scenarios flexibly. When the correlation is strong, due to the high similarity between the signals, a complex feature weight fusion strategy is employed to extract and fuse the essential features of different modalities more efficiently, enabling this fusion to mutually enhance the expressive power of the features for feature fusion. Whereas, when the correlation is weak, the correlation between the signals is low and forcing a complex fusion may introduce more noise and redundant information, therefore a hybrid input strategy is used. The CAME-Transformer Decoder (CAME-TD) model dynamically updates the correlation thresholds and fusion strategies using regularized loss constraints to ensure adaptation to multimodal signal differences. During model training, visual analysis of the attention mechanism role weights and feature learning helps in parameter optimization and performance evaluation. The experimental results demonstrate the effectiveness of the proposed methodology, with an improvement in fault diagnosis performance under various operating and noise conditions compared to a single signal input. Moreover, the CAME-TD model not only achieves considerable diagnostic performance but also enhances interpretability, providing a new approach for rolling bearing fault diagnosis. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

15. Self‐supervised learning for improved calibrationless radial MRI with NLINV‐Net.

Author: Blumenthal, Moritz, Fantinato, Chiara, Unterberg‐Buchwald, Christina, Haltmeier, Markus, Wang, Xiaoqing, and Uecker, Martin
Subjects: CARDIAC imaging, IMAGE reconstruction, MAGNETIC resonance imaging, REGULARIZATION parameter, INVERSE problems
Abstract: Purpose: To develop a neural network architecture for improved calibrationless reconstruction of radial data when no ground truth is available for training. Methods: NLINV‐Net is a model‐based neural network architecture that directly estimates images and coil sensitivities from (radial) k‐space data via nonlinear inversion (NLINV). Combined with a training strategy using self‐supervision via data undersampling (SSDU), it can be used for imaging problems where no ground truth reconstructions are available. We validated the method for (1) real‐time cardiac imaging and (2) single‐shot subspace‐based quantitative T1 mapping. Furthermore, region‐optimized virtual (ROVir) coils were used to suppress artifacts stemming from outside the field of view and to focus the k‐space‐based SSDU loss on the region of interest. NLINV‐Net‐based reconstructions were compared with conventional NLINV and PI‐CS (parallel imaging + compressed sensing) reconstruction and the effect of the region‐optimized virtual coils and the type of training loss was evaluated qualitatively. Results: NLINV‐Net‐based reconstructions contain significantly less noise than the NLINV‐based counterpart. ROVir coils effectively suppress streakings which are not suppressed by the neural networks while the ROVir‐based focused loss leads to visually sharper time series for the movement of the myocardial wall in cardiac real‐time imaging. For quantitative imaging, T1‐maps reconstructed using NLINV‐Net show similar quality as PI‐CS reconstructions, but NLINV‐Net does not require slice‐specific tuning of the regularization parameter. Conclusion: NLINV‐Net is a versatile tool for calibrationless imaging which can be used in challenging imaging scenarios where a ground truth is not available. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

16. The 3-billion fossil question: How to automate classification of microfossils.

Author: Martinsen, Iver, Wade, David, Ricaud, Benjamin, and Godtliebsen, Fred
Subjects: FOSSILS, CARBON sequestration, DEEP learning, MACHINE learning, ARTIFICIAL intelligence, CONVOLUTIONAL neural networks
Abstract: Microfossil classification is an important discipline in subsurface exploration, for both oil & gas and Carbon Capture and Storage (CCS). The abundance and distribution of species found in sedimentary rocks provide valuable information about the age and depositional environment. However, the analysis is difficult and timeconsuming, as it is based on manual work by human experts. Attempts to automate this process face two key challenges: (1) the input data are very large - our dataset is projected to grow to 3 billion microfossils, and (2) there are not enough labeled data to use the standard procedure of training a deep learning classifier. We propose an efficient pipeline for processing and grouping fossils by genus, or even species, from microscope slides using self-supervised learning. First we show how to efficiently extract crops from whole slide images by adapting previously trained object detection algorithms. Second, we provide a comparison of a range of self-supervised learning methods to classify and identify microfossils from very few labels. We obtain excellent results with both convolutional neural networks and vision transformers fine-tuned by self-supervision. Our approach is fast and computationally light, providing a handy tool for geologists working with microfossils. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

17. Attention-guided mask learning for self-supervised 3D action recognition.

Author: Zhang, Haoyuan
Subjects: LEARNING strategies, RECOGNITION (Psychology), ENCODING
Abstract: Most existing 3D action recognition works rely on the supervised learning paradigm, yet the limited availability of annotated data limits the full potential of encoding networks. As a result, effective self-supervised pre-training strategies have been actively researched. In this paper, we target to explore a self-supervised learning approach for 3D action recognition, and propose the Attention-guided Mask Learning (AML) scheme. Specifically, the dropping mechanism is introduced into contrastive learning to develop Attention-guided Mask (AM) module as well as mask learning strategy, respectively. The AM module leverages the spatial and temporal attention to guide the corresponding features masking, so as to produce the masked contrastive object. The mask learning strategy enables the model to discriminate different actions even with important features masked, which makes action representation learning more discriminative. What's more, to alleviate the strict positive constraint that would hinder representation learning, the positive-enhanced learning strategy is leveraged in the second-stage training. Extensive experiments on NTU-60, NTU-120, and PKU-MMD datasets show that the proposed AML scheme improves the performance in self-supervised 3D action recognition, achieving state-of-the-art results. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

18. Cross‐shaped windows transformer with self‐supervised pretraining for clinically significant prostate cancer detection in bi‐parametric MRI.

Author: Li, Yuheng, Wynne, Jacob, Wang, Jing, Qiu, Richard L. J., Roper, Justin, Pan, Shaoyan, Jani, Ashesh B., Liu, Tian, Patel, Pretesh R., Mao, Hui, and Yang, Xiaofeng
Subjects: *CONVOLUTIONAL neural networks, *TRANSFORMER models, *MAGNETIC resonance imaging, *DEEP learning, *PROSTATE cancer
Abstract: Background Purpose Methods and materials Results Conclusions Bi‐parametric magnetic resonance imaging (bpMRI) has demonstrated promising results in prostate cancer (PCa) detection. Vision transformers have achieved competitive performance compared to convolutional neural network (CNN) in deep learning, but they need abundant annotated data for training. Self‐supervised learning can effectively leverage unlabeled data to extract useful semantic representations without annotation and its associated costs.This study proposes a novel self‐supervised learning framework and a transformer model to enhance PCa detection using prostate bpMRI.We introduce a novel end‐to‐end Cross‐Shaped windows (CSwin) transformer UNet model, CSwin UNet, to detect clinically significant prostate cancer (csPCa) in prostate bpMRI. We also propose a multitask self‐supervised learning framework to leverage unlabeled data and improve network generalizability. Using a large prostate bpMRI dataset (PI‐CAI) with 1476 patients, we first pretrain CSwin transformer using multitask self‐supervised learning to improve data‐efficiency and network generalizability. We then finetune using lesion annotations to perform csPCa detection. We also test the network generalization using a separate bpMRI dataset with 158 patients (Prostate158).Five‐fold cross validation shows that self‐supervised CSwin UNet achieves 0.888 ± 0.010 aread under receiver operating characterstics curve (AUC) and 0.545 ± 0.060 Average Precision (AP) on PI‐CAI dataset, significantly outperforming four comparable models (nnFormer, Swin UNETR, DynUNet, Attention UNet, UNet). On model generalizability, self‐supervised CSwin UNet achieves 0.79 AUC and 0.45 AP, still outperforming all other comparable methods and demonstrating good generalization to external data.This study proposes CSwin UNet, a new transformer‐based model for end‐to‐end detection of csPCa, enhanced by self‐supervised pretraining to enhance network generalizability. We employ an automatic weighted loss (AWL) to unify pretext tasks, improving representation learning. Evaluated on two multi‐institutional public datasets, our method surpasses existing methods in detection metrics and demonstrates good generalization to external data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

19. Self-Supervised Learning for Near-Wild Cognitive Workload Estimation.

Author: Rafiei, Mohammad H., Gauthier, Lynne V., Adeli, Hojjat, and Takabi, Daniel
Subjects: *PREVENTION of medical errors, *INDUSTRIAL psychology, *ELECTROENCEPHALOGRAPHY, *MENTAL fatigue, *DECISION making in clinical medicine, *DESCRIPTIVE statistics, *ELECTROCARDIOGRAPHY, *MACHINE learning, *PATIENT monitoring, *MEDICAL artifacts, *COGNITION
Abstract: Feedback on cognitive workload may reduce decision-making mistakes. Machine learning-based models can produce feedback from physiological data such as electroencephalography (EEG) and electrocardiography (ECG). Supervised machine learning requires large training data sets that are (1) relevant and decontaminated and (2) carefully labeled for accurate approximation, a costly and tedious procedure. Commercial over-the-counter devices are low-cost resolutions for the real-time collection of physiological modalities. However, they produce significant artifacts when employed outside of laboratory settings, compromising machine learning accuracies. Additionally, the physiological modalities that most successfully machine-approximate cognitive workload in everyday settings are unknown. To address these challenges, a first-ever hybrid implementation of feature selection and self-supervised machine learning techniques is introduced. This model is employed on data collected outside controlled laboratory settings to (1) identify relevant physiological modalities to machine approximate six levels of cognitive-physical workloads from a seven-modality repository and (2) postulate limited labeling experiments and machine approximate mental-physical workloads using self-supervised learning techniques. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

20. A Multi-Scale CNN for Transfer Learning in sEMG-Based Hand Gesture Recognition for Prosthetic Devices.

Author: Fratti, Riccardo, Marini, Niccolò, Atzori, Manfredo, Müller, Henning, Tiengo, Cesare, and Bassetto, Franco
Subjects: *CONVOLUTIONAL neural networks, *ARTIFICIAL hands, *INSTRUCTIONAL systems, *DATABASES, *SIGNAL processing
Abstract: Advancements in neural network approaches have enhanced the effectiveness of surface Electromyography (sEMG)-based hand gesture recognition when measuring muscle activity. However, current deep learning architectures struggle to achieve good generalization and robustness, often demanding significant computational resources. The goal of this paper was to develop a robust model that can quickly adapt to new users using Transfer Learning. We propose a Multi-Scale Convolutional Neural Network (MSCNN), pre-trained with various strategies to improve inter-subject generalization. These strategies include domain adaptation with a gradient-reversal layer and self-supervision using triplet margin loss. We evaluated these approaches on several benchmark datasets, specifically the NinaPro databases. This study also compared two different Transfer Learning frameworks designed for user-dependent fine-tuning. The second Transfer Learning framework achieved a 97% F1 Score across 14 classes with an average of 1.40 epochs, suggesting potential for on-site model retraining in cases of performance degradation over time. The findings highlight the effectiveness of Transfer Learning in creating adaptive, user-specific models for sEMG-based prosthetic hands. Moreover, the study examined the impacts of rectification and window length, with a focus on real-time accessible normalizing techniques, suggesting significant improvements in usability and performance. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

21. Large-Kernel Central Block Masked Convolution and Channel Attention-Based Reconstruction Network for Anomaly Detection of High-Resolution Hyperspectral Imagery.

Author: Ran, Qiong, Zhong, Hong, Sun, Xu, Wang, Degang, and Sun, He
Subjects: *ANOMALY detection (Computer security), *SPATIAL resolution, *IMAGE sensors, *NEIGHBORHOODS, *SCARCITY, *PIXELS, *MULTISPECTRAL imaging
Abstract: In recent years, the rapid advancement of drone technology has led to an increasing use of drones equipped with hyperspectral sensors for ground imaging. Hyperspectral data captured via drones offer significantly higher spatial resolution, but this also introduces more complex background details and larger target scales in high-resolution hyperspectral imagery (HRHSI), posing substantial challenges for hyperspectral anomaly detection (HAD). Mainstream reconstruction-based deep learning methods predominantly emphasize spatial local information in hyperspectral images (HSIs), relying on small spatial neighborhoods for reconstruction. As a result, large anomalous targets and background details are often well reconstructed, leading to poor anomaly detection performance, as these targets are not sufficiently distinguished from the background. To address these limitations, we propose a novel HAD network for HRHSI based on large-kernel central block masked convolution and channel attention, termed LKCMCA. Specifically, we first employ the pixel-shuffle technique to reduce the size of anomalous targets without losing image information. Next, we design a large-kernel central block masked convolution to make the network pay more attention to the surrounding background information, enabling better fusion of the information between adjacent bands. This, coupled with an efficient channel attention mechanism, allows the network to capture deeper spectral features, enhancing the reconstruction of the background while suppressing anomalous targets. Furthermore, we introduce an adaptive loss function by down-weighting anomalous pixels based on the mean absolute error. This loss function is specifically designed to suppress the reconstruction of potentially anomalous pixels during network training, allowing our model to be considered an excellent background reconstruction network. By leveraging reconstruction error, the model effectively highlights anomalous targets. Meanwhile, we produced four benchmark datasets specifically for HAD tasks using existing HRHSI data, addressing the current shortage of HRHSI datasets in the HAD field. Extensive experiments demonstrate that our LKCMCA method achieves superior detection performance, outperforming ten state-of-the-art HAD methods on all datasets. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

22. Proto-DS: A Self-Supervised Learning-Based Nondestructive Testing Approach for Food Adulteration with Imbalanced Hyperspectral Data.

Author: Pang, Kunkun, Liu, Yisen, Zhou, Songbin, Liao, Yixiao, Yin, Zexuan, Zhao, Lulu, and Chen, Hong
Subjects: CONVOLUTIONAL neural networks, FOOD adulteration, COFFEE beans, NONDESTRUCTIVE testing, DEEP learning
Abstract: Conventional food fraud detection using hyperspectral imaging (HSI) relies on the discriminative power of machine learning. However, these approaches often assume a balanced class distribution in an ideal laboratory environment, which is impractical in real-world scenarios with diverse label distributions. This results in suboptimal performance when less frequent classes are overshadowed by the majority class during training. Thus, the critical research challenge emerges of how to develop an effective classifier on a small-scale imbalanced dataset without significant bias from the dominant class. In this paper, we propose a novel nondestructive detection approach, which we call the Dice Loss Improved Self-Supervised Learning-Based Prototypical Network (Proto-DS), designed to address this imbalanced learning challenge. The proposed amalgamation mitigates the label bias on the most frequent class, further improving robustness. We validate our proposed method on three collected hyperspectral food image datasets with varying degrees of data imbalance: Citri Reticulatae Pericarpium (Chenpi), Chinese herbs, and coffee beans. Comparisons with state-of-the-art imbalanced learning techniques, including the Synthetic Minority Oversampling Technique (SMOTE) and class-importance reweighting, reveal our method's superiority. Notably, our experiments demonstrate that Proto-DS consistently outperforms conventional approaches, achieving the best average balanced accuracy of 88.18% across various training sample sizes, whereas the Logistic Model Tree (LMT), Multi-Layer Perceptron (MLP), and Convolutional Neural Network (CNN) approaches attain only 59.42%, 60.38%, and 66.34%, respectively. Overall, self-supervised learning is key to improving imbalanced learning performance and outperforms related approaches, while both prototypical networks and the Dice loss can further enhance classification performance. Intriguingly, self-supervised learning can provide complementary information to existing imbalanced learning approaches. Combining these approaches may serve as a potential solution for building effective models with limited training data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

23. Improved transferability of self-supervised learning models through batch normalization finetuning.

Author: Sirotkin, Kirill, Escudero-Viñolo, Marcos, Carballeira, Pablo, and García-Martín, Álvaro
Subjects: TRANSFER of training, TASK performance, CLASSIFICATION, PATHOLOGY, COST
Abstract: Abundance of unlabelled data and advances in Self-Supervised Learning (SSL) have made it the preferred choice in many transfer learning scenarios. Due to the rapid and ongoing development of SSL approaches, practitioners are now faced with an overwhelming amount of models trained for a specific task/domain, calling for a method to estimate transfer performance on novel tasks/domains. Typically, the role of such estimator is played by linear probing which trains a linear classifier on top of the frozen feature extractor. In this work we address a shortcoming of linear probing — it is not very strongly correlated with the performance of the models finetuned end-to-end— the latter often being the final objective in transfer learning— and, in some cases, catastrophically misestimates a model's potential. We propose a way to obtain a significantly better proxy task by unfreezing and jointly finetuning batch normalization layers together with the classification head. At a cost of extra training of only 0.16% model parameters, in case of ResNet-50, we acquire a proxy task that (i) has a stronger correlation with end-to-end finetuned performance, (ii) improves the linear probing performance in the many- and few-shot learning regimes and (iii) in some cases, outperforms both linear probing and end-to-end finetuning, reaching the state-of-the-art performance on a pathology dataset. Finally, we analyze and discuss the changes batch normalization training introduces in the feature distributions that may be the reason for the improved performance. The code is available at https://github.com/vpulab/bn_finetuning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

24. Learning A-Share Stock Recommendation from Stock Graph and Historical Price Simultaneously.

Author: Chen, Hanyang, Wang, Tian, Konpang, Jessada, and Sirikham, Adisorn
Subjects: GRAPH neural networks, RECURRENT neural networks, INVESTORS, ECONOMIC change, FINANCIAL statements
Abstract: The Chinese stock market, marked by rapid growth and significant volatility, presents unique challenges for investors and analysts. A-share stocks, traded on the Shanghai and Shenzhen exchanges, are crucial to China's financial system and offer opportunities for both domestic and international investors. Accurate stock recommendation tools are vital for informed decision making, especially given the ongoing regulatory changes and economic reforms in China. Current stock recommendation methods often fall short, as they typically fail to capture the complex inter-company relationships and rely heavily on financial reports, neglecting the potential of unlabeled data and historical price trends. In response, we propose a novel approach that combines graph-based structures with historical price data to develop self-learned stock embeddings for A-share recommendations. Our method leverages self-supervised learning, bypassing the need for human-generated labels and autonomously uncovering latent relationships and patterns within the data. This dual-input strategy enhances the understanding of market dynamics, leading to more accurate stock predictions. Our contributions include a novel framework for label-free stock recommendations with modeling stock connections and pricing information, and empirical evidence demonstrating the robustness and adaptability of our approach in the volatile Chinese stock market. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

25. Enhancing mosquito classification through self-supervised learning.

Author: Charoenpanyakul, Ratana, Kittichai, Veerayuth, Eiamsamang, Songpol, Sriwichai, Patchara, Pinetsuksai, Natchapon, Naing, Kaung Myat, Tongloy, Teerawat, Boonsang, Siridech, and Chuwongin, Santhad
Subjects: *IMAGE recognition (Computer vision), *RESOURCE-limited settings, *MOSQUITO vectors, *IMAGE analysis, *VECTOR analysis
Abstract: Traditional mosquito identification methods, relied on microscopic observation and morphological characteristics, often require significant expertise and experience, which can limit their effectiveness. This study introduces a self-supervised learning-based image classification model using the Bootstrap Your Own Latent (BYOL) algorithm, designed to enhance mosquito species identification efficiently. The BYOL algorithm offers a key advantage by eliminating the need for labeled data during pretraining, as it autonomously learns important features. During fine-tuning, the model requires only a small fraction of labeled data to achieve accurate results. Our approach demonstrates impressive performance, achieving over 96.77% accuracy in mosquito image analysis, with minimized both false positives and false negatives. Additionally, the model's overall accuracy, measured by the area under the ROC curve, surpasses 99.55%, highlighting its robustness and reliability. A notable finding is that fine-tuning with just 10% of labeled data produces results comparable to using the full dataset. This is particularly valuable for resource-limited settings with limited access to advanced equipment and expertise. Our model provides a practical solution for mosquito identification, overcoming the challenges of traditional microscopic methods, such as the time-consuming process and reliance on specialized knowledge in healthcare services. Overall, this model supports personnel in resource-constrained environments by facilitating mosquito vector density analysis and paving the way for future mosquito species identification methodologies. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

26. Identification of veterinary and medically important blood parasites using contrastive loss-based self-supervised learning.

Author: Busayakanon, Supasuta, Kaewthamasorn, Morakot, Pinetsuksai, Natchapon, Tongloy, Teerawat, Chuwongin, Santhad, Boonsang, Siridech, and Kittichai, Veerayuth
Subjects: *BLOOD parasites, *RECEIVER operating characteristic curves, *RESOURCE-limited settings, *ERYTHROCYTES, *ZOONOSES, *APICOMPLEXA, *BABESIA
Abstract: Background and Aim: Zoonotic diseases caused by various blood parasites are important public health concerns that impact animals and humans worldwide. The traditional method of microscopic examination for parasite diagnosis is labor-intensive, time-consuming, and prone to variability among observers, necessitating highly skilled and experienced personnel. Therefore, an innovative approach is required to enhance the conventional method. This study aimed to develop a self-supervised learning (SSL) approach to identify zoonotic blood parasites from microscopic images, with an initial focus on parasite species classification. Materials and Methods: We acquired a public dataset featuring microscopic images of Giemsa-stained thin blood films of trypanosomes and other blood parasites, including Babesia, Leishmania, Plasmodium, Toxoplasma, and Trichomonad, as well as images of both white and red blood cells. The input data were subjected to SSL model training using the Bootstrap Your Own Latent (BYOL) algorithm with Residual Network 50 (ResNet50), ResNet101, and ResNet152 as the backbones. The performance of the proposed SSL model was then compared to that of baseline models. Results: The proposed BYOL SSL model outperformed supervised learning models across all classes. Among the SSL models, ResNet50 consistently achieved high accuracy, reaching 0.992 in most classes, which aligns well with the patterns observed in the pre-trained uniform manifold approximation and projection representations. Fine-tuned SSL models exhibit high performance, achieving 95% accuracy and a 0.960 area under the curve of the receiver operating characteristics (ROC) curve even when fine-tuned with 1% of the data in the downstream process. Furthermore, 20% of the data for training with SSL models yielded ≥95% in all other statistical metrics, including accuracy, recall, precision, specification, F1 score, and ROC curve. As a result, multi-class classification prediction demonstrated that model performance exceeded 91% for the F1 score, except for the early stage of Trypanosoma evansi, which showed an F1 score of 87%. This may be due to the model being exposed to high levels of variation during the developmental stage. Conclusion: This approach can significantly enhance active surveillance efforts to improve disease control and prevent outbreaks, particularly in resource-limited settings. In addition, SSL addresses significant challenges, such as data variability and the requirement for extensive class labeling, which are common in biology and medical fields. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

27. An Empirical Study of Self-Supervised Learning with Wasserstein Distance.

Author: Yamada, Makoto, Takezawa, Yuki, Houry, Guillaume, Düsterwald, Kira Michaela, Sulem, Deborah, Zhao, Han, and Tsai, Yao-Hung
Subjects: *DISTANCE education, *EMPIRICAL research, *PROBABILITY theory, *TREES, *COSINE function
Abstract: In this study, we consider the problem of self-supervised learning (SSL) utilizing the 1-Wasserstein distance on a tree structure (a.k.a., Tree-Wasserstein distance (TWD)), where TWD is defined as the L1 distance between two tree-embedded vectors. In SSL methods, the cosine similarity is often utilized as an objective function; however, it has not been well studied when utilizing the Wasserstein distance. Training the Wasserstein distance is numerically challenging. Thus, this study empirically investigates a strategy for optimizing the SSL with the Wasserstein distance and finds a stable training procedure. More specifically, we evaluate the combination of two types of TWD (total variation and ClusterTree) and several probability models, including the softmax function, the ArcFace probability model, and simplicial embedding. We propose a simple yet effective Jeffrey divergence-based regularization method to stabilize optimization. Through empirical experiments on STL10, CIFAR10, CIFAR100, and SVHN, we find that a simple combination of the softmax function and TWD can obtain significantly lower results than the standard SimCLR. Moreover, a simple combination of TWD and SimSiam fails to train the model. We find that the model performance depends on the combination of TWD and probability model, and that the Jeffrey divergence regularization helps in model training. Finally, we show that the appropriate combination of the TWD and probability model outperforms cosine similarity-based representation learning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

28. Robust self-supervised learning strategy to tackle the inherent sparsity in single-cell RNA-seq data.

Author: Park, Sejin and Lee, Hyunju
Subjects: *LEARNING strategies, *DRUG tolerance, *GENE expression, *RNA sequencing, *TRANSFORMER models, *SUPERVISED learning
Abstract: Single-cell RNA sequencing (scRNA-seq) is a powerful tool for elucidating cellular heterogeneity and tissue function in various biological contexts. However, the sparsity in scRNA-seq data limits the accuracy of cell type annotation and transcriptomic analysis due to information loss. To address this limitation, we present scRobust, a robust self-supervised learning strategy to tackle the inherent sparsity of scRNA-seq data. Built upon the Transformer architecture, scRobust employs a novel self-supervised learning strategy comprising contrastive learning and gene expression prediction tasks. We demonstrated the effectiveness of scRobust using nine benchmarks, additional dropout scenarios, and combined datasets. scRobust outperformed recent methods in cell-type annotation tasks and generated cell embeddings that capture multi-faceted clustering information (e.g. cell types and HbA1c levels). In addition, cell embeddings of scRobust were useful for detecting specific marker genes related to drug tolerance stages. Furthermore, when we applied scRobust to scATAC-seq data, high-quality cell embedding vectors were generated. These results demonstrate the representational power of scRobust. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

29. Prototype-based contrastive substructure identification for molecular property prediction.

Author: He, Gaoqi, Liu, Shun, Liu, Zhuoran, Wang, Changbo, Zhang, Kai, and Li, Honglin
Subjects: *GRAPH neural networks, *MOLECULAR graphs, *SOURCE code, *PROTOTYPES
Abstract: Substructure-based representation learning has emerged as a powerful approach to featurize complex attributed graphs, with promising results in molecular property prediction (MPP). However, existing MPP methods mainly rely on manually defined rules to extract substructures. It remains an open challenge to adaptively identify meaningful substructures from numerous molecular graphs to accommodate MPP tasks. To this end, this paper proposes P rototype-based c O ntrastive S ubstructure I dentifica T ion (POSIT), a self-supervised framework to autonomously discover substructural prototypes across graphs so as to guide end-to-end molecular fragmentation. During pre-training, POSIT emphasizes two key aspects of substructure identification: firstly, it imposes a soft connectivity constraint to encourage the generation of topologically meaningful substructures; secondly, it aligns resultant substructures with derived prototypes through a prototype-substructure contrastive clustering objective, ensuring attribute-based similarity within clusters. In the fine-tuning stage, a cross-scale attention mechanism is designed to integrate substructure-level information to enhance molecular representations. The effectiveness of the POSIT framework is demonstrated by experimental results from diverse real-world datasets, covering both classification and regression tasks. Moreover, visualization analysis validates the consistency of chemical priors with identified substructures. The source code is publicly available at https://github.com/VRPharmer/POSIT. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

30. Graph contrastive learning as a versatile foundation for advanced scRNA-seq data analysis.

Author: Zhang, Zhenhao, Liu, Yuxi, Xiao, Meichen, Wang, Kun, Huang, Yu, Bian, Jiang, Yang, Ruolin, and Li, Fuyi
Subjects: *GRAPH neural networks, *MACHINE learning, *GENE expression, *RNA sequencing, *SOURCE code, *DEEP learning
Abstract: Single-cell RNA sequencing (scRNA-seq) offers unprecedented insights into transcriptome-wide gene expression at the single-cell level. Cell clustering has been long established in the analysis of scRNA-seq data to identify the groups of cells with similar expression profiles. However, cell clustering is technically challenging, as raw scRNA-seq data have various analytical issues, including high dimensionality and dropout values. Existing research has developed deep learning models, such as graph machine learning models and contrastive learning-based models, for cell clustering using scRNA-seq data and has summarized the unsupervised learning of cell clustering into a human-interpretable format. While advances in cell clustering have been profound, we are no closer to finding a simple yet effective framework for learning high-quality representations necessary for robust clustering. In this study, we propose scSimGCL, a novel framework based on the graph contrastive learning paradigm for self-supervised pretraining of graph neural networks. This framework facilitates the generation of high-quality representations crucial for cell clustering. Our scSimGCL incorporates cell-cell graph structure and contrastive learning to enhance the performance of cell clustering. Extensive experimental results on simulated and real scRNA-seq datasets suggest the superiority of the proposed scSimGCL. Moreover, clustering assignment analysis confirms the general applicability of scSimGCL, including state-of-the-art clustering algorithms. Further, ablation study and hyperparameter analysis suggest the efficacy of our network architecture with the robustness of decisions in the self-supervised learning setting. The proposed scSimGCL can serve as a robust framework for practitioners developing tools for cell clustering. The source code of scSimGCL is publicly available at https://github.com/zhangzh1328/scSimGCL. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

31. HAPiCLR: heuristic attention pixel-level contrastive loss representation learning for self-supervised pretraining.

Author: Tran, Van Nhiem, Liu, Shen-Hsuan, Huang, Chi-En, Aslam, Muhammad Saqlain, Yang, Kai-Lin, Li, Yung-Hui, and Wang, Jia-Ching
Subjects: *IMAGE recognition (Computer vision), *VISUAL learning, *VECTOR spaces, *LEARNING, *PIXELS
Abstract: Recent self-supervised contrastive learning methods are powerful and efficient for robust representation learning, pulling semantic features from different cropping views of the same image while pushing other features away from other images in the embedding vector space. However, model training for contrastive learning is quite inefficient. In the high-dimensional vector space of the images, images can differ from each other in many ways. We address this problem with heuristic attention pixel-level contrastive loss for representation learning (HAPiCLR), a self-supervised joint embedding contrastive framework that operates at the pixel level and makes use of heuristic mask information. HAPiCLR leverages pixel-level information from the object's contextual representation instead of identifying pair-wise differences in instance-level representations. Thus, HAPiCLR enhances contrastive learning objectives without requiring large batch sizes, memory banks, or queues, thereby reducing the memory footprint and the processing needed for large datasets. Furthermore, HAPiCLR loss combined with other contrastive objectives such as SimCLR or MoCo loss produces considerable performance boosts on all downstream tasks, including image classification, object detection, and instance segmentation. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

32. Weakly‐supervised learning‐based pathology detection and localization in 3D chest CT scans.

Author: Djahnine, Aissam, Jupin‐Delevaux, Emilien, Nempont, Olivier, Si‐Mohamed, Salim Aymeric, Craighero, Fabien, Cottin, Vincent, Douek, Philippe, Popoff, Alexandre, and Boussel, Loic
Subjects: *MACHINE learning, *COMPUTED tomography, *RECEIVER operating characteristic curves, *CONFIDENCE intervals, *CLINICAL medicine, *LUNGS
Abstract: Background: Recent advancements in anomaly detection have paved the way for novel radiological reading assistance tools that support the identification of findings, aimed at saving time. The clinical adoption of such applications requires a low rate of false positives while maintaining high sensitivity. Purpose: In light of recent interest and development in multi pathology identification, we present a novel method, based on a recent contrastive self‐supervised approach, for multiple chest‐related abnormality identification including low lung density area ("LLDA"), consolidation ("CONS"), nodules ("NOD") and interstitial pattern ("IP"). Our approach alerts radiologists about abnormal regions within a computed tomography (CT) scan by providing 3D localization. Methods: We introduce a new method for the classification and localization of multiple chest pathologies in 3D Chest CT scans. Our goal is to distinguish four common chest‐related abnormalities: "LLDA", "CONS", "NOD", "IP" and "NORMAL". This method is based on a 3D patch‐based classifier with a Resnet backbone encoder pretrained leveraging recent contrastive self supervised approach and a fine‐tuned classification head. We leverage the SimCLR contrastive framework for pretraining on an unannotated dataset of randomly selected patches and we then fine‐tune it on a labeled dataset. During inference, this classifier generates probability maps for each abnormality across the CT volume, which are aggregated to produce a multi‐label patient‐level prediction. We compare different training strategies, including random initialization, ImageNet weight initialization, frozen SimCLR pretrained weights and fine‐tuned SimCLR pretrained weights. Each training strategy is evaluated on a validation set for hyperparameter selection and tested on a test set. Additionally, we explore the fine‐tuned SimCLR pretrained classifier for 3D pathology localization and conduct qualitative evaluation. Results: Validated on 111 chest scans for hyperparameter selection and subsequently tested on 251 chest scans with multi‐abnormalities, our method achieves an AUROC of 0.931 (95% confidence interval [CI]: [0.9034, 0.9557], p$ p$‐value < 0.001) and 0.963 (95% CI: [0.952, 0.976], p$ p$‐value < 0.001) in the multi‐label and binary (i.e., normal versus abnormal) settings, respectively. Notably, our method surpasses the area under the receiver operating characteristic (AUROC) threshold of 0.9 for two abnormalities: IP (0.974) and LLDA (0.952), while achieving values of 0.853 and 0.791 for NOD and CONS, respectively. Furthermore, our results highlight the superiority of incorporating contrastive pretraining within the patch classifier, outperforming Imagenet pretraining weights and non‐pretrained counterparts with uninitialized weights (F1 score = 0.943, 0.792, and 0.677 respectively). Qualitatively, the method achieved a satisfactory 88.8% completeness rate in localization and maintained an 88.3% accuracy rate against false positives. Conclusions: The proposed method integrates self‐supervised learning algorithms for pretraining, utilizes a patch‐based approach for 3D pathology localization and develops an aggregation method for multi‐label prediction at patient‐level. It shows promise in efficiently detecting and localizing multiple anomalies within a single scan. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

33. Explicitly modeling relationships between domain-specific and domain-invariant interests for cross-domain recommendation.

Author: Zang, Tianzi, Zhu, Yanmin, Zhang, Ruohan, Zhu, Jing, and Tang, Feilong
Abstract: This paper focuses on cross-domain recommendation (CDR) without auxiliary information. Existing works on CDR all ignore the bi-directional transformation relationships between users’ domain-invariant interests and domain-specific interests. Moreover, they only rely on the sparse interactions as supervised signals for model training, which can not guarantee the generated representations are effective. In response to the limitation of these existing works, we propose a model named MRCDR which explicitly models relationships between domain-specific and domain-invariant interests for cross-domain recommendation. We project the domain-specific representations of users to a common space generating their domain-invariant representations. To remedy the problem of insufficient supervised signals, we propose two strategies that generate extra self-supervision signals to enhance model training. The aligned strategy tries to make the two domain-invariant representations an overlapped user to be consistent. The cycle strategy tries to make the reversely projected representation of the domain-invariant representation of a non-overlapped user to be consistent with its original domain-specific representation. We conduct extensive experiments on real-world datasets and the results show the effectiveness of our proposed model against the state-of-the-art methods. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

34. Self-supervised modal optimization transformer for image captioning.

Author: Wang, Ye, Li, Daitianxia, Liu, Qun, Liu, Li, and Wang, Guoyin
Subjects: *ELECTRONIC data processing, *GENERALIZATION
Abstract: In multimodal data processing of image captioning, data from different modalities usually exhibit distinct feature distributions. The gap in unimodal representation makes capturing cross-modal mappings in multimodal learning challenging. Current image captioning models transform images into captions directly. However, this approach results in large data requirements and limited performance on small quantities of multimodal data. In this paper, we introduce a novel self-supervised modal optimization transformer (SMOT) for image captioning. Specifically, we leverage self-supervised learning to propose a cross-modal feature optimizer. This optimizer aims to optimize the distribution of semantic information in images by leveraging raw images and their corresponding paired captions, ultimately approaching the semantic object of the caption. The optimized image features inherit information from both modalities, reducing the disparity in feature distribution between modalities and decreasing reliance on extensive training data. Furthermore, we fuse the features with image grid features and text features, using their complementary information to bridge the differences between features, providing more comprehensive semantic guidance for image captioning. Experimental results demonstrate that our proposed SMOT outperforms state-of-the-art models when trained on limited data, showing efficient learning and good generalization capabilities on small training datasets. Additionally, it also exhibits competitive performance on the MSCOCO dataset, further highlighting its efficacy and potential in the field of image captioning. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

35. Self-supervised learning for CT image denoising and reconstruction: a review.

Author: Choi, Kihwan
Abstract: This article reviews the self-supervised learning methods for CT image denoising and reconstruction. Currently, deep learning has become a dominant tool in medical imaging as well as computer vision. In particular, self-supervised learning approaches have attracted great attention as a technique for learning CT images without clean/noisy references. After briefly reviewing the fundamentals of CT image denoising and reconstruction, we examine the progress of deep learning in CT image denoising and reconstruction. Finally, we focus on the theoretical and methodological evolution of self-supervised learning for image denoising and reconstruction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

36. PointSmile: point self-supervised learning via curriculum mutual information.

Author: Li, Xin, Wei, Mingqiang, and Chen, Songcan
Abstract: Self-supervised learning is attracting significant attention from researchers in the point cloud processing field. However, due to the natural sparsity and irregularity of point clouds, effectively extracting discriminative and transferable features for efficient training on downstream tasks remains an unsolved challenge. Consequently, we propose PointSmile, a reconstruction-free self-supervised learning paradigm by maximizing curriculum mutual information (CMI) across the replicas of point cloud objects. From the perspective of how-and-what-to-learn, PointSmile is designed to imitate human curriculum learning, i.e., starting with easier topics in a curriculum and gradually progressing to learning more complex topics in the curriculum. To solve “how-to-learn”, we introduce curriculum data augmentation (CDA) of point clouds. CDA encourages PointSmile to follow a learning path that starts from learning easy data samples and progresses to learning hard data samples, such that the latent space can be dynamically affected to create better embeddings. To solve “what-to-learn”, we propose maximizing both feature- and class-wise CMI to better extract discriminative features of point clouds. Unlike most existing methods, PointSmile does not require a pretext task or cross-modal data to yield rich latent representations; additionally, it can be easily transferred to various backbones. We demonstrate the effectiveness and robustness of PointSmile in downstream tasks such as object classification and segmentation. The study results show that PointSmile outperforms existing self-supervised methods and compares favorably with popular fully supervised methods on various standard architectures. The code is available at . [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

37. Implementation of a Whisper Architecture-Based Turkish Automatic Speech Recognition (ASR) System and Evaluation of the Effect of Fine-Tuning with a Low-Rank Adaptation (LoRA) Adapter on Its Performance.

Author: Polat, Hüseyin, Turan, Alp Kaan, Koçak, Cemal, and Ulaş, Hasan Basri
Subjects: AUTOMATIC speech recognition, ARTIFICIAL intelligence, DEEP learning, ERROR rates, TURKISH language, SPEECH perception
Abstract: This paper focuses on the implementation of the Whisper architecture to create an automatic speech recognition (ASR) system optimized for the Turkish language, which is considered a low-resource language in terms of speech recognition technologies. Whisper is a transformer-based model known for its high performance across numerous languages. However, its performance in Turkish, a language with unique linguistic features and limited labeled data, has yet to be fully explored. To address this, we conducted a series of experiments using five different Turkish speech datasets to assess the model's baseline performance. Initial evaluations revealed a range of word error rates (WERs) between 4.3% and 14.2%, reflecting the challenges posed by Turkish. To improve these results, we applied the low-rank adaptation (LoRA) technique, which is designed to fine-tune large-scale models efficiently by introducing a reduced set of trainable parameters. After fine-tuning, significant performance improvements were observed, with WER reductions of up to 52.38%. This study demonstrates that fine-tuned Whisper models can be successfully adapted for Turkish, resulting in a robust and accurate end-to-end ASR system. This research highlights the applicability of Whisper in low-resource languages and provides insights into the challenges of and strategies for improving speech recognition performance in Turkish. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

38. Automatic high-precision crack detection of post-earthquake structure based on self-supervised transfer learning method and SegCrackFormer.

Author: Meng, Shiqiao, Zhou, Ying, and Jafari, Abouzar
Subjects: TRANSFORMER models, DEEP learning, TRANSFER of training, EARTHQUAKES, SUPERVISED learning, ANNOTATIONS
Abstract: Accurate crack detection is essential for structural damage assessment after earthquake disasters. However, due to the gap between the target domain of the detected structure and the source domain, it is challenging to achieve high-precision crack segmentation when performing crack detection based on deep learning (DL) in actual engineering. This article proposes a crack segmentation transfer learning method based on a self-supervised learning mechanism and a high-quality pseudo-label generation method, which can significantly improve the detection accuracy in the target domain without pre-made annotations. Besides, to improve the crack segmentation model's ability to extract local and global features, this article proposes a SegCrackFormer model, which embeds convolutional layers and multi-head self-attention modules. An experiment of the crack segmentation transfer learning method is performed on two open-source crack datasets, METU and Crack500, and a newly proposed LD dataset. The experimental results show that the crack segmentation transfer learning method proposed in this article can improve the mean intersection over union (mIoU) by 38.41% and 15.66% on the Crack500 and LD datasets, respectively. The proposed SegCrackFormer is evaluated through comparative experiments, which demonstrate its superiority over existing crack segmentation models on the METU dataset. Additionally, the proposed method is shown to require significantly less computational resources than other existing models, which highlights the potential of SegCrackFormer as a powerful and efficient model for crack segmentation in practical applications. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

39. Self‐supervised learning for denoising of multidimensional MRI data.

Author: Kang, Beomgu, Lee, Wonil, Seo, Hyunseok, Heo, Hye‐Young, and Park, HyunWook
Subjects: MAGNETIZATION transfer, DATA scrubbing, MAGNETIC resonance imaging, QUANTITATIVE research, NOISE
Abstract: Purpose: To develop a fast denoising framework for high‐dimensional MRI data based on a self‐supervised learning scheme, which does not require ground truth clean image. Theory and Methods: Quantitative MRI faces limitations in SNR, because the variation of signal amplitude in a large set of images is the key mechanism for quantification. In addition, the complex non‐linear signal models make the fitting process vulnerable to noise. To address these issues, we propose a fast deep‐learning framework for denoising, which efficiently exploits the redundancy in multidimensional MRI data. A self‐supervised model was designed to use only noisy images for training, bypassing the challenge of clean data paucity in clinical practice. For validation, we used two different datasets of simulated magnetization transfer contrast MR fingerprinting (MTC‐MRF) dataset and in vivo DWI image dataset to show the generalizability. Results: The proposed method drastically improved denoising performance in the presence of mild‐to‐severe noise regardless of noise distributions compared to previous methods of the BM3D, tMPPCA, and Patch2self. The improvements were even pronounced in the following quantification results from the denoised images. Conclusion: The proposed MD‐S2S (Multidimensional‐Self2Self) denoising technique could be further applied to various multi‐dimensional MRI data and improve the quantification accuracy of tissue parameter maps. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

40. ssVERDICT: Self‐supervised VERDICT‐MRI for enhanced prostate tumor characterization.

Author: Sen, Snigdha, Singh, Saurabh, Pye, Hayley, Moore, Caroline M., Whitaker, Hayley C., Punwani, Shonit, Atkinson, David, Panagiotaki, Eleftheria, and Slator, Paddy J.
Subjects: MACHINE learning, DIFFUSION magnetic resonance imaging, PROSTATE cancer patients, PEARSON correlation (Statistics), DEEP learning, PROSTATE cancer
Abstract: Purpose: Demonstrating and assessing self‐supervised machine‐learning fitting of the VERDICT (vascular, extracellular and restricted diffusion for cytometry in tumors) model for prostate cancer. Methods: We derive a self‐supervised neural network for fitting VERDICT (ssVERDICT) that estimates parameter maps without training data. We compare the performance of ssVERDICT to two established baseline methods for fitting diffusion MRI models: conventional nonlinear least squares and supervised deep learning. We do this quantitatively on simulated data by comparing the Pearson's correlation coefficient, mean‐squared error, bias, and variance with respect to the simulated ground truth. We also calculate in vivo parameter maps on a cohort of 20 prostate cancer patients and compare the methods' performance in discriminating benign from cancerous tissue via Wilcoxon's signed‐rank test. Results: In simulations, ssVERDICT outperforms the baseline methods (nonlinear least squares and supervised deep learning) in estimating all the parameters from the VERDICT prostate model in terms of Pearson's correlation coefficient, bias, and mean‐squared error. In vivo, ssVERDICT shows stronger lesion conspicuity across all parameter maps, and improves discrimination between benign and cancerous tissue over the baseline methods. Conclusion: ssVERDICT significantly outperforms state‐of‐the‐art methods for VERDICT model fitting and shows, for the first time, fitting of a detailed multicompartment biophysical diffusion MRI model with machine learning without the requirement of explicit training labels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

41. Multi-label remote sensing classification with self-supervised gated multi-modal transformers.

Author: Na Liu, Ye Yuan, Guodong Wu, Sai Zhang, Jie Leng, and Lihong Wan
Subjects: TRANSFORMER models, SYNTHETIC aperture radar, REMOTE sensing, MULTISENSOR data fusion, RESEARCH personnel
Abstract: Introduction: With the great success of Transformers in the field of machine learning, it is also gradually attracting widespread interest in the field of remote sensing (RS). However, the research in the field of remote sensing has been hampered by the lack of large labeled data sets and the inconsistency of data modes caused by the diversity of RS platforms. With the rise of self-supervised learning (SSL) algorithms in recent years, RS researchers began to pay attention to the application of “pre-training and fine-tuning” paradigm in RS. However, there are few researches on multi-modal data fusion in remote sensing field. Most of them choose to use only one of the modal data or simply splice multiple modal data roughly. Method: In order to study a more efficient multi-modal data fusion scheme, we propose a multi-modal fusion mechanism based on gated unit control (MGSViT). In this paper, we pretrain the ViT model based on BigEarthNet dataset by combining two commonly used SSL algorithms, and propose an intra-modal and inter-modal gated fusion unit for feature learning by combining multispectral (MS) and synthetic aperture radar (SAR). Our method can effectively combine different modal data to extract key feature information. Results and discussion: After fine-tuning and comparison experiments, we outperform the most advanced algorithms in all downstream classification tasks. The validity of our proposed method is verified. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

42. Bridging chemical structure and conceptual knowledge enables accurate prediction of compound-protein interaction.

Author: Tao, Wen, Lin, Xuan, Liu, Yuansheng, Zeng, Li, Ma, Tengfei, Cheng, Ning, Jiang, Jing, Zeng, Xiangxiang, and Yuan, Sisi
Subjects: *KNOWLEDGE graphs, *KNOWLEDGE representation (Information theory), *DRUG discovery, *CHEMICAL structure, *PROTEIN structure
Abstract: Background: Accurate prediction of compound-protein interaction (CPI) plays a crucial role in drug discovery. Existing data-driven methods aim to learn from the chemical structures of compounds and proteins yet ignore the conceptual knowledge that is the interrelationships among the fundamental elements in the biomedical knowledge graph (KG). Knowledge graphs provide a comprehensive view of entities and relationships beyond individual compounds and proteins. They encompass a wealth of information like pathways, diseases, and biological processes, offering a richer context for CPI prediction. This contextual information can be used to identify indirect interactions, infer potential relationships, and improve prediction accuracy. In real-world applications, the prevalence of knowledge-missing compounds and proteins is a critical barrier for injecting knowledge into data-driven models. Results: Here, we propose BEACON, a data and knowledge dual-driven framework that bridges chemical structure and conceptual knowledge for CPI prediction. The proposed BEACON learns the consistent representations by maximizing the mutual information between chemical structure and conceptual knowledge and predicts the missing representations by minimizing their conditional entropy. BEACON achieves state-of-the-art performance on multiple datasets compared to competing methods, notably with 5.1% and 6.6% performance gain on the BIOSNAP and DrugBank datasets, respectively. Moreover, BEACON is the only approach capable of effectively predicting knowledge representations for knowledge-lacking compounds and proteins. Conclusions: Overall, our work provides a general approach for directly injecting conceptual knowledge to enhance the performance of CPI prediction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

43. Self-Supervised Image Aesthetic Assessment Based on Transformer.

Author: Jia, Minrui, Wang, Guangao, Wang, Zibei, Yang, Shuai, Ke, Yongzhen, and Wang, Kai
Subjects: *TRANSFORMER models, *TASK analysis, *RESEARCH personnel, *INPAINTING, *AESTHETICS
Abstract: Visual aesthetics has always been an important area of computational vision, and researchers have continued exploring it. To further improve the performance of the image aesthetic evaluation task, we introduce a Transformer into the image aesthetic evaluation task. This paper pioneers a novel self-supervised image aesthetic evaluation model founded upon Transformers. Meanwhile, we expand the pretext task to capture rich visual representations, adding a branch for inpainting the masked images in parallel with the tasks related to aesthetic quality degradation operations. Our model’s refinement employs the innovative uncertainty weighting method, seamlessly amalgamating three distinct losses into a unified objective. On the AVA dataset, our approach surpasses the efficacy of prevailing self-supervised image aesthetic assessment methods. Remarkably, we attain results approaching those of supervised methods, even while operating with a limited dataset. On the AADB dataset, our approach improves the aesthetic binary classification accuracy by roughly 16% compared to other self-supervised image aesthetic assessment methods and improves the prediction of aesthetic attributes. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

44. Pre‐training strategy for antiviral drug screening with low‐data graph neural network: A case study in HIV‐1 K103N reverse transcriptase.

Author: Boonpalit, Kajjana, Chuntakaruk, Hathaichanok, Kinchagawat, Jiramet, Wolschann, Peter, Hannongbua, Supot, Rungrotmongkol, Thanyada, and Nutanong, Sarana
Subjects: *GRAPH neural networks, *DRUG discovery, *REVERSE transcriptase inhibitors, *REVERSE transcriptase, *MOLECULAR dynamics, *PLASMODIUM
Abstract: Graph neural networks (GNN) offer an alternative approach to boost the screening effectiveness in drug discovery. However, their efficacy is often hindered by limited datasets. To address this limitation, we introduced a robust GNN training framework, applied to various chemical databases to identify potent non‐nucleoside reverse transcriptase inhibitors (NNRTIs) against the challenging K103N‐mutated HIV‐1 RT. Leveraging self‐supervised learning (SSL) pre‐training to tackle data scarcity, we screened 1,824,367 compounds, using multi‐step approach that incorporated machine learning (ML)‐based screening, analysis of absorption, distribution, metabolism, and excretion (ADME) prediction, drug‐likeness properties, and molecular docking. Ultimately, 45 compounds were left as potential candidates with 17 of the compounds were previously identified as NNRTIs, exemplifying the model's efficacy. The remaining 28 compounds are anticipated to be repurposed for new uses. Molecular dynamics (MD) simulations on repurposed candidates unveiled two promising preclinical drugs: one designed against Plasmodium falciparum and the other serving as an antibacterial agent. Both have superior binding affinity compared to anti‐HIV drugs. This conceptual framework could be adapted for other disease‐specific therapeutics, facilitating the identification of potent compounds effective against both WT and mutants while revealing novel scaffolds for drug design and discovery. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

45. Self-supervised few-shot medical image segmentation with spatial transformations.

Author: Titoriya, Ankit Kumar, Singh, Maheshwari Prasad, and Singh, Amit Kumar
Subjects: *COMPUTER-assisted image analysis (Medicine), *MAGNETIC resonance imaging, *CARDIAC magnetic resonance imaging, *DIAGNOSTIC imaging, *CARDIAC imaging, *DEEP learning, *IMAGE segmentation
Abstract: Deep learning-based segmentation models often struggle to achieve optimal performance when encountering new, unseen semantic classes. Their effectiveness hinges on vast amounts of annotated data and high computational resources for training. However, a promising solution to mitigate these challenges is the adoption of few-shot segmentation (FSS) networks, which can train models with reduced annotated data. The inherent complexity of medical images limits the applicability of FSS in medical imaging, despite its potential. Recent advancements in self-supervised label-efficient FSS models have demonstrated remarkable efficacy in medical image segmentation tasks. This paper presents a novel FSS architecture that enhances segmentation accuracy by utilising fewer features than existing methodologies. Additionally, this paper proposes a novel self-supervised learning approach that utilises supervoxel and augmented superpixel images to further enhance segmentation accuracy. This paper assesses the efficacy of the proposed model on two different datasets: abdominal magnetic resonance imaging (MRI) and cardiac MRI. The proposed model achieves a mean dice score and mean intersection over union of 81.62% and 70.38% for abdominal images, and 79.38% and 65.23% for cardiac images. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

46. SVQ-MAE: an efficient speech pre-training framework with constrained computational resources.

Author: Zhuang, Xuyi, Qian, Yukun, and Wang, Mingjiang
Subjects: VECTOR quantization, CONTEXTUAL learning, EMOTION recognition, SPEECH, ERROR rates
Abstract: Self-supervised learning for speech pre-training models has achieved remarkable success in acquiring superior speech contextual representations by learning from unlabeled audio, excelling in numerous downstream speech tasks. However, the pre-training of these models necessitates significant computational resources and training duration, presenting a high barrier to entry into the realm of pre-training learning. In our efforts, by amalgamating the resource-efficient benefits of the generative learning model, Masked Auto Encoder, with the efficacy of the vector quantization method in discriminative learning, we introduce a novel pre-training framework: Speech Vector Quantization Masked Auto Encoder (SVQ-MAE). Distinct from the majority of SSL frameworks, which require simultaneous construction of speech contextual representations and mask reconstruction within an encoder-only module, we have exclusively designed a decoupled decoder for pre-training SVQ-MAE. This allows the additional decoupled decoder to undertake the mask reconstruction task solely, reducing the learning complexity of pretext tasks and enhancing the encoder's efficiency in extracting speech contextual representations. Owing to this innovation, by using only 4 GPUs, SVQ-NAE can achieve high performance comparable to wav2vec 2.0, which requires 64 GPUs for training. In the Speech Processing Universal Performance Benchmark, SVQ-MAE surpasses wav2vec 2.0 in both keyword spotting and emotion recognition tasks. Furthermore, in cross-lingual ASR for Mandarin, upon fine-tuning on AISHELL-1, SVQ-MAE achieves a Character Error Rate of 4.09%, outperforming all supervised ASR models. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

47. GeomorPM: a geomorphic pretrained model integrating convolution and Transformer architectures based on DEM data.

Author: Yang, Jiaqi, Xu, Jun, Zhu, Yunqiang, Liu, Ze, and Zhou, Chenghu
Subjects: *ARTIFICIAL intelligence, *ARCHITECTURAL design, *LANDFORMS, *DIGITAL elevation models, *LEARNING ability, *DEEP learning
Abstract: AbstractAs the domain of artificial intelligence has advanced, the integration of deep learning techniques into terrain and landform analysis has become more prevalent. Nevertheless, many existing methods are fully supervised and designed for specific tasks; thus, their transferability is limited and massive annotated samples are required. This study introduces a geomorphic pretrained model (GeomorPM) capable of performing multiple tasks. First, an architecture was designed that combined a convolution-based Vector Quantised-Variational Autoencoder (VQVAE) with a Transformer-based masked autoencoder (MAE) framework, allowing it to autonomously learn local details and global patterns from large-scale digital elevation model (DEM) data. Subsequently, GeomorPM, based on the VQMAE architecture, was pretrained on massive DEM data and fine-tuned for three specific tasks: DEM void filling, DEM superresolution, and landform classification. GeomorPM outperformed the traditional and other deep learning methods in all three tasks, demonstrating the superior learning ability and transferability of the model. This study provides a practical framework for developing pretrained models based on DEMs that can be expanded to other continuous geoscientific data. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

48. Hierarchical Self-Supervised Learning for Knowledge-Aware Recommendation.

Author: Zhou, Cong, Zhou, Sihang, Huang, Jian, and Wang, Dong
Subjects: KNOWLEDGE graphs, BLENDED learning, RECOMMENDER systems, PROBLEM solving, TOPOLOGY
Abstract: Knowledge-aware recommendation systems have shown superior performance by connecting user item interaction graph (UIG) with knowledge graph (KG) and enriching semantic connections collected by the corresponding networks. Among the existing methods, self-supervised learning has attracted the most attention for its significant effects in extracting node self-discrimination auxiliary supervision, which can largely improve the recommending rationality. However, existing methods usually employ a single (either node or edge) perspective for representation learning, over-emphasizing the pair-wise topology structure in the graph, thus overlooking the important semantic information among neighborhood-wise connection, limiting the recommendation performance. To solve the problem, we propose Hierarchical self-supervised learning for Knowledge-aware Recommendation (HKRec). The hierarchical property of the method is shown in two perspectives. First, to better reveal the knowledge graph semantic relations, we design a Triple-Graph Masked Autoencoder (T-GMAE) to force the network to estimate the masked node features, node connections, and node degrees. Second, to better align the user-item recommendation knowledge with the common knowledge, we conduct contrastive learning in a hybrid way, i.e., both neighborhood-level and edge-level dropout are adopted in a parallel way to allow more comprehensive information distillation. We conduct an in-depth experimental evaluation on three real-world datasets, comparing our proposed HKRec with state-of-the-art baseline models to demonstrate its effectiveness and superiority. Respectively, Recall@20 and NDCG@20 improved by 2.2% to 24.95% and 3.38% to 22.32% in the Last-FM dataset, by 7.0% to 23.82% and 5.7% to 39.66% in the MIND dataset, and by 1.76% to 34.73% and 1.62% to 35.13% in the Alibaba-iFashion dataset. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

49. Bridging vision and touch: advancing robotic interaction prediction with self-supervised multimodal learning.

Author: Li, Luchen, Thuruthel, Thomas George, Tamantini, Christian, and Psomopoulou, Efi
Subjects: ARTIFICIAL neural networks, GRAPH neural networks, PATTERN recognition systems, ARTIFICIAL intelligence, GENERATIVE adversarial networks, LOW vision, PHYSICAL contact, REINFORCEMENT learning, VIDEO coding
Abstract: Predicting the consequences of the agent's actions on its environment is a pivotal challenge in robotic learning, which plays a key role in developing higher cognitive skills for intelligent robots. While current methods have predominantly relied on vision and motion data to generate the predicted videos, more comprehensive sensory perception is required for complex physical interactions such as contact-rich manipulation or highly dynamic tasks. In this work, we investigate the interdependence between vision and tactile sensation in the scenario of dynamic robotic interaction. A multi-modal fusion mechanism is introduced to the action-conditioned video prediction model to forecast future scenes, which enriches the single-modality prototype with a compressed latent representation of multiple sensory inputs. Additionally, to accomplish the interactive setting, we built a robotic interaction system that is equipped with both web cameras and vision-based tactile sensors to collect the dataset of vision-tactile sequences and the corresponding robot action data. Finally, through a series of qualitative and quantitative comparative study of different prediction architecture and tasks, we present insightful analysis of the cross-modality influence between vision, tactile and action, revealing the asymmetrical impact that exists between the sensations when contributing to interpreting the environment information. This opens possibilities for more adaptive and efficient robotic control in complex environments, with implications for dexterous manipulation and human-robot interaction. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

50. Pseudo-label refinement via hierarchical contrastive learning for source-free unsupervised domain adaptation.

Author: Li, Deng, Zhang, Jianguang, Wu, Kunhong, Shi, Yucheng, and Han, Yahong
Subjects: *DATA privacy, *PROBLEM solving, *GENERALIZATION, *SEMANTICS, *NOISE
Abstract: Source-free unsupervised domain adaptation aims to adapt a source model to an unlabeled target domain without accessing the source data due to privacy considerations. Existing works mainly solve the problem by self-training methods and representation learning. However, these works typically learn the representation on a single semantic level and barely exploit the rich hierarchical semantic information to obtain clear decision boundaries, which makes it hard for these methods to achieve satisfactory generalization performance. In this paper, we propose a novel hierarchical contrastive domain adaptation algorithm that exploits self-supervised contrastive learning on both fine-grained instances and coarse-grained cluster semantics. On the one hand, we propose an adaptive prototype pseudo-labeling strategy to obtain much more reliable labels. On the other hand, we propose hierarchical contrastive representation learning on both fine-grained instance-wise level and coarse-grained cluster level to reduce the negative effect of label noise and stabilize the whole training procedure. Extensive experiments are conducted on primary unsupervised domain adaptation benchmark datasets, and the results demonstrate the effectiveness of the proposed method. • Propose a hierarchical self-supervised learning method for source-free UDA. • Devise an adaptive pseudo-labeling method to obtain much more reliable labels. • Improve the pseudo-label quality with hierarchical contrastive learning. • Learn semantic features on both instance-wise and semantic cluster levels. [ABSTRACT FROM AUTHOR]
Published: 2024
Full Text: View/download PDF

Catalog

Books, media, physical & digital resources

See catalog results

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Region

Database

Publisher

3,718 results on '"Self-supervised learning"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources