Back to Search Start Over

Computational identification of N4-methylcytosine sites in the mouse genome with machine-learning method

Authors :
Hasan Zulfiqar
Rida Sarwar Khan
Farwa Hassan
Kyle Hippe
Cassandra Hunt
Hui Ding
Xiao-Ming Song
Renzhi Cao
Source :
Mathematical Biosciences and Engineering, Vol 18, Iss 4, Pp 3348-3363 (2021)
Publication Year :
2021
Publisher :
AIMS Press, 2021.

Abstract

N4-methylcytosine (4mC) is a kind of DNA modification which could regulate multiple biological processes. Correctly identifying 4mC sites in genomic sequences can provide precise knowledge about their genetic roles. This study aimed to develop an ensemble model to predict 4mC sites in the mouse genome. In the proposed model, DNA sequences were encoded by k-mer, enhanced nucleic acid composition and composition of k-spaced nucleic acid pairs. Subsequently, these features were optimized by using minimum redundancy maximum relevance (mRMR) with incremental feature selection (IFS) and five-fold cross-validation. The obtained optimal features were inputted into random forest classifier for discriminating 4mC from non-4mC sites in mouse. On the independent dataset, our model could yield the overall accuracy of 85.41%, which was approximately 3.8% -6.3% higher than the two existing models, i4mC-Mouse and 4mCpred-EL respectively. The data and source code of the model can be freely download from https://github.com/linDing-groups/model_4mc.

Details

Language :
English
ISSN :
15510018
Volume :
18
Issue :
4
Database :
Directory of Open Access Journals
Journal :
Mathematical Biosciences and Engineering
Publication Type :
Academic Journal
Accession number :
edsdoj.050d0ca07437494b8f5da19c9b1fc229
Document Type :
article
Full Text :
https://doi.org/10.3934/mbe.2021167?viewType=HTML