Back to Search Start Over

Prediction of SNP Sequences via Gini Impurity Based Gradient Boosting Method

Authors :
Longquan Jiang
Bo Zhang
Qin Ni
Xuan Sun
Pingping Dong
Source :
IEEE Access, Vol 7, Pp 12647-12657 (2019)
Publication Year :
2019
Publisher :
IEEE, 2019.

Abstract

Recent research has witnessed the fostered application of machine learning approaches in analyzing the single nucleotide polymorphisms (SNP) data, which has been proved to be implicated in complex human diseases. In the identification of SNPs responsible for complex diseases, most genome-wide association studies always took single SNP into consideration at one time and ignored diverse interactions between SNPs. One of the major problems is the higher number of features and the relatively small number of individuals, which complicates the task and harms the predictive ability of DNA sequences. In this paper, a novel boosting-based ensemble approach was proposed to study these interactions. An importance scoring strategy based on Gini impurity was introduced for feature selection. We evaluated its efficacy on the SNP genotyping data collected by the Southeastern University of China and compared it with naive Bayes, support vector machine, and random forest. The experimental results have shown its validity and effectiveness on SNP interaction identification. In addition, our approach had an obvious advantage of computational time and resources.

Details

Language :
English
ISSN :
21693536 and 18418414
Volume :
7
Database :
Directory of Open Access Journals
Journal :
IEEE Access
Publication Type :
Academic Journal
Accession number :
edsdoj.3dd376d18418414bbc8e35a62b900bfc
Document Type :
article
Full Text :
https://doi.org/10.1109/ACCESS.2019.2893269