1. Enhancing Arabidopsis thaliana ubiquitination site prediction through knowledge distillation and natural language processing.
- Author
-
Nguyen, Van-Nui, Tran, Thi-Xuan, Nguyen, Thi-Tuyen, and Le, Nguyen Quoc Khanh
- Subjects
- *
ARTIFICIAL neural networks , *MACHINE learning , *POST-translational modification , *PHYSIOLOGY , *UBIQUITINATION - Abstract
• Novel model predicts Arabidopsis ubiquitination using knowledge distillation and NLP. • Teacher-student framework enhances prediction accuracy (86.3 %) and AUC (0.926). • Outperforms existing methods for Arabidopsis ubiquitination prediction. • Model's robustness validated through independent testing and comparisons. Protein ubiquitination is a critical post-translational modification (PTM) involved in diverse biological processes and plays a pivotal role in regulating physiological mechanisms and disease states. Despite various efforts to develop ubiquitination site prediction tools across species, these tools mainly rely on predefined sequence features and machine learning algorithms, with species-specific variations in ubiquitination patterns remaining poorly understood. This study introduces a novel approach for predicting Arabidopsis thaliana ubiquitination sites using a neural network model based on knowledge distillation and natural language processing (NLP) of protein sequences. Our framework employs a multi-species "Teacher model" to guide a more compact, species-specific "Student model", with the "Teacher" generating pseudo-labels that enhance the "Student" learning and prediction robustness. Cross-validation results demonstrate that our model achieves superior performance, with an accuracy of 86.3 % and an area under the curve (AUC) of 0.926, while independent testing confirmed these results with an accuracy of 86.3 % and an AUC of 0.923. Comparative analysis with established predictors further highlights the model's superiority, emphasizing the effectiveness of integrating knowledge distillation and NLP in ubiquitination prediction tasks. This study presents a promising and efficient approach for ubiquitination site prediction, offering valuable insights for researchers in related fields. The code and resources are available on GitHub: https://github.com/nuinvtnu/KD_ArapUbi. [ABSTRACT FROM AUTHOR]
- Published
- 2024
- Full Text
- View/download PDF