Back to Search Start Over

Identifying LncRNA-Encoded Short Peptides Using Optimized Hybrid Features and Ensemble Learning.

Authors :
Zhao S
Meng J
Kang Q
Luan Y
Source :
IEEE/ACM transactions on computational biology and bioinformatics [IEEE/ACM Trans Comput Biol Bioinform] 2022 Sep-Oct; Vol. 19 (5), pp. 2873-2881. Date of Electronic Publication: 2022 Oct 10.
Publication Year :
2022

Abstract

Long non-coding RNA (lncRNA) contains short open reading frames (sORFs), and sORFs-encoded short peptides (SEPs) have become the focus of scientific studies due to their crucial role in life activities. The identification of SEPs is vital to further understanding their regulatory function. Bioinformatics methods can quickly identify SEPs to provide credible candidate sequences for verifying SEPs by biological experimenrts. However, there is a lack of methods for identifying SEPs directly. In this study, a machine learning method to identify SEPs of plant lncRNA (ISPL) is proposed. Hybrid features including sequence features and physicochemical features are extracted manually or adaptively to construct different modal features. In order to keep the stability of feature selection, the non-linear correction applied in Max-Relevance-Max-Distance (nocRD) feature selection method is proposed, which integrates multiple feature ranking results and uses the iterative random forest for different modal features dimensionality reduction. Classification models with different modal features are constructed, and their outputs are combined for ensemble classification. The experimental results show that the accuracy of ISPL is 89.86% percent on the independent test set, which will have important implications for further studies of functional genomic.

Details

Language :
English
ISSN :
1557-9964
Volume :
19
Issue :
5
Database :
MEDLINE
Journal :
IEEE/ACM transactions on computational biology and bioinformatics
Publication Type :
Academic Journal
Accession number :
34383651
Full Text :
https://doi.org/10.1109/TCBB.2021.3104288