1. GSRNet, an adversarial training-based deep framework with multi-scale CNN and BiGRU for predicting genomic signals and regions.
- Author
-
Zhu, Gancheng, Fan, Yusi, Li, Fei, Tsz Ho Choi, Annebella, Tan, Zhikang, Cheng, Yiruo, Li, Kewei, Wang, Siyang, Luo, Changfan, Liu, Hongmei, Zhang, Gongyou, Yao, Zhaomin, Zhang, Yaqi, Huang, Lan, and Zhou, Fengfeng
- Subjects
- *
DEEP learning , *MACHINE learning , *ERROR rates , *GENETIC regulation , *EUKARYOTIC genomes , *SOURCE code - Abstract
A genome carries many functional genomic signals and regions (GSRs), which play a vital role in orchestrating the complex biological processes in eukaryotic organisms. Precise recognition of the GSRs within a genomic sequence is the first step to an understanding of genomic organization and gene regulation. Previous studies have used machine learning or deep learning algorithms to identify GSRs based on hand-crafted features, that frequently fail to capture complex patterns within the GSRs. The one-hot encoding or word2vec embedding algorithms used in several deep learning-based studies have the potential to overcome the weakness of the human-designed features, but they may fail to capture contextual and positional information. The present study proposes a general-purpose end-to-end framework for GSR prediction (GSRNet), that integrates DNABERT embedding, adversarial training, BiGRU, and multi-scale CNN to eliminate human involvement in feature engineering. The GSRNet is evaluated with polyadenylation signals (PAS) and translation initiation sites (TIS) prediction tasks. The comparative experiments show that the proposed GSRNet outperforms the state-of-the-art methods reported in previous studies, with a drop in the error rate by 1.08% and 1.50% for human PAS and TIS GSR, respectively. Our model reduces the relative error rate up to 8.73% and 32.97%, respectively. The improved detections of the two types of GSRs (PAS and TIS) across four organisms confirmed the effectiveness and robustness of the proposed GSRNet. The source code and the data are freely available at http://www.healthinformaticslab.org/supp/resources.php. [ABSTRACT FROM AUTHOR]
- Published
- 2023
- Full Text
- View/download PDF