Back to Search Start Over

A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets.

Authors :
Sayed, Sabah
Nassef, Mohammad
Badr, Amr
Farag, Ibrahim
Source :
Expert Systems with Applications. May2019, Vol. 121, p233-243. 11p.
Publication Year :
2019

Abstract

Highlights • Feature selection over high-dimensional colon cancer Microarray Datasets. • Features selected from both Gene Expression and DNA-Methylation Microarray datasets. • Resultant six biomarker genes for colon cancer validated using Enrichment Analysis. • Biomarker genes validated on independent datasets with 99.9% classification accuracy. Abstract Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t -test and genetic algorithm. After preprocessing the data using t -test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classification accuracy. Consequently, the biological significance of the resulting optimal genes is validated using Enrichment Analysis. Moreover, the results of Nested-GA have been compared to the results of other feature selection algorithms that have been run on either Gene Expression or DNA Methylation datasets. From the experimental results, Nested-GA showed the highest classification performance with a small optimal feature subset compared to the other algorithms. Furthermore, by running Nested-GA on lung cancer datasets that contain two different cancer subtypes, it resulted in significantly better classification accuracy (98.4%) compared to the accuracy of a previous research (84.6%) that utilized lung cancer DNA-Methylation data only. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09574174
Volume :
121
Database :
Academic Search Index
Journal :
Expert Systems with Applications
Publication Type :
Academic Journal
Accession number :
134153181
Full Text :
https://doi.org/10.1016/j.eswa.2018.12.022