1. Ensemble learning-based predictor for driver synonymous mutation with sequence representation.
- Author
-
Bi, Chuanmei, Shi, Yong, Xia, Junfeng, Liang, Zhen, Wu, Zhiqiang, Xu, Kai, and Cheng, Na
- Abstract
Synonymous mutations, once considered neutral, are now understood to have significant implications for a variety of diseases, particularly cancer. It is indispensable to identify these driver synonymous mutations in human cancers, yet current methods are constrained by data limitations. In this study, we initially investigate the impact of sequence-based features, including DNA shape, physicochemical properties and one-hot encoding of nucleotides, and deep learning-derived features from pre-trained chemical molecule language models based on BERT. Subsequently, we propose EPEL, an effect predictor for synonymous mutations employing ensemble learning. EPEL combines five tree-based models and optimizes feature selection to enhance predictive accuracy. Notably, the incorporation of DNA shape features and deep learning-derived features from chemical molecule represents a pioneering effect in assessing the impact of synonymous mutations in cancer. Compared to existing state-of-the-art methods, EPEL demonstrates superior performance on independent test datasets. Furthermore, our analysis reveals a significant correlation between effect scores and patient outcomes across various cancer types. Interestingly, while deep learning methods have shown promise in other fields, their DNA sequence representations do not significantly enhance the identification of driver synonymous mutations in this study. Overall, we anticipate that EPEL will facilitate researchers to more precisely target driver synonymous mutations. EPEL is designed with flexibility, allowing users to retrain the prediction model and generate effect scores for synonymous mutations in human cancers. A user-friendly web server for EPEL is available at http://ahmu.EPEL.bio/. Author summary: Although driver synonymous mutations play a crucial role in cancer, their identification is challenged by limited data and intricate pathogenic mechanisms. To overcome these obstacles, we introduced EPEL, a stacking ensemble learning approach for predicting the impact of synonymous mutations in cancer. We systematically explored various novel features, including DNA shape characteristics and chemical molecule-based features. The results show that these features significantly enhance the predictive performance of driver synonymous mutations. We compared EPEL with the other state-of-the-art methods on an independent test dataset. The findings reveal that EPEL substantially enhances the accuracy of identifying driver synonymous mutations. In addition, our findings highlight a critical correlation between effect scores and patient outcomes across various cancer types. It is worth noting that deep biological language models contribute less to the prediction of driver synonymous mutation. We anticipate that these findings will aid in deepening the understanding of driver synonymous mutations. [ABSTRACT FROM AUTHOR]
- Published
- 2025
- Full Text
- View/download PDF