Back to Search
Start Over
sORFPred: A Method Based on Comprehensive Features and Ensemble Learning to Predict the sORFs in Plant LncRNAs.
- Source :
-
Interdisciplinary sciences, computational life sciences [Interdiscip Sci] 2023 Jun; Vol. 15 (2), pp. 189-201. Date of Electronic Publication: 2023 Jan 27. - Publication Year :
- 2023
-
Abstract
- Long non-coding RNAs (lncRNAs) are important regulators of biological processes. It has recently been shown that some lncRNAs include small open reading frames (sORFs) that can encode small peptides of no more than 100 amino acids. However, existing methods are commonly applied to human and animal datasets and still suffer from low feature representation capability. Thus, accurate and credible prediction of sORFs with coding ability in plant lncRNAs is imperative. This paper proposes a new method termed sORFPred, in which we design a model named MCSEN by combining multi-scale convolution and Squeeze-and-Excitation Networks to fully mine distinct information embedded in sORFs, integrate and optimize multiple sequence-based and physicochemical feature descriptors, and built a two-layer prediction classifier based on Bayesian optimization algorithm and Extra Trees. sORFPred has been evaluated on sORFs datasets of three species and experimentally validated sORFs dataset. Results indicate that sORFPred outperforms existing methods and achieves 97.28% accuracy, 97.06% precision, 97.52% recall, and 97.29% F1-score on Arabidopsis thaliana, which shows a significant improvement in prediction performance compared to various conventional shallow machine learning and deep learning models.<br /> (© 2023. International Association of Scientists in the Interdisciplinary Areas.)
Details
- Language :
- English
- ISSN :
- 1867-1462
- Volume :
- 15
- Issue :
- 2
- Database :
- MEDLINE
- Journal :
- Interdisciplinary sciences, computational life sciences
- Publication Type :
- Academic Journal
- Accession number :
- 36705893
- Full Text :
- https://doi.org/10.1007/s12539-023-00552-4