Back to Search
Start Over
A Nested Genetic Algorithm for feature selection in high-dimensional cancer Microarray datasets.
- Source :
-
Expert Systems with Applications . May2019, Vol. 121, p233-243. 11p. - Publication Year :
- 2019
-
Abstract
- Highlights • Feature selection over high-dimensional colon cancer Microarray Datasets. • Features selected from both Gene Expression and DNA-Methylation Microarray datasets. • Resultant six biomarker genes for colon cancer validated using Enrichment Analysis. • Biomarker genes validated on independent datasets with 99.9% classification accuracy. Abstract Cancer is a dangerous disease that causes death worldwide. Discovering few genes relevant to one cancer disease can result in effective treatments. The challenge associated with the Microarray datasets is its high dimensionality; the huge number of features compared to the modest number of samples in these datasets. Recent research efforts attempted to reduce this high-dimensionality using different feature selection techniques. This paper presents an ensemble feature selection technique based on t -test and genetic algorithm. After preprocessing the data using t -test, a Nested Genetic Algorithm, namely Nested-GA, is used to get the optimal subset of features by combining data from two different datasets. Nested-GA consists of two Nested Genetic Algorithms (outer and inner) that run on two different kinds of datasets. The Outer Genetic Algorithm (OGA-SVM) works on Microarray gene expression datasets, whereas the Inner Genetic Algorithm (IGA-NNW) runs on DNA Methylation datasets. Nested-GA is performed on a colon cancer dataset with 5-fold cross validation. After applying Nested-GA, the Incremental Feature Selection (IFS) strategy is used to get the smallest optimal genes subset. The genes subset has been validated on an independent dataset resulting in 99.9% classification accuracy. Consequently, the biological significance of the resulting optimal genes is validated using Enrichment Analysis. Moreover, the results of Nested-GA have been compared to the results of other feature selection algorithms that have been run on either Gene Expression or DNA Methylation datasets. From the experimental results, Nested-GA showed the highest classification performance with a small optimal feature subset compared to the other algorithms. Furthermore, by running Nested-GA on lung cancer datasets that contain two different cancer subtypes, it resulted in significantly better classification accuracy (98.4%) compared to the accuracy of a previous research (84.6%) that utilized lung cancer DNA-Methylation data only. [ABSTRACT FROM AUTHOR]
- Subjects :
- *GENETIC algorithms
*COLON cancer
*GENE expression
*DNA methylation
*ALGORITHMS
Subjects
Details
- Language :
- English
- ISSN :
- 09574174
- Volume :
- 121
- Database :
- Academic Search Index
- Journal :
- Expert Systems with Applications
- Publication Type :
- Academic Journal
- Accession number :
- 134153181
- Full Text :
- https://doi.org/10.1016/j.eswa.2018.12.022