1. Identification of leukemia stem cell expression signatures through Monte Carlo feature selection strategy and support vector machine
- Author
-
Yu-Dong Cai, Xiangyin Kong, Min Liu, Yu-Hang Zhang, Lin Lu, Kai-Yan Feng, Lei Chen, Tao Huang, YaoChen Xu, and JiaRui Li
- Subjects
0301 basic medicine ,Cancer Research ,Cell type ,Support Vector Machine ,Datasets as Topic ,Feature selection ,Computational biology ,Biology ,Transcriptome ,03 medical and health sciences ,0302 clinical medicine ,Biomarkers, Tumor ,medicine ,Humans ,Molecular Biology ,Gene ,Models, Genetic ,Gene Expression Profiling ,Computational Biology ,Myeloid leukemia ,medicine.disease ,Support vector machine ,Leukemia, Myeloid, Acute ,Leukemia ,030104 developmental biology ,Drug Resistance, Neoplasm ,030220 oncology & carcinogenesis ,Neoplastic Stem Cells ,Feasibility Studies ,Molecular Medicine ,Stem cell ,Monte Carlo Method - Abstract
Acute myeloid leukemia (AML) is a type of blood cancer characterized by the rapid growth of immature white blood cells from the bone marrow. Therapy resistance resulting from the persistence of leukemia stem cells (LSCs) are found in numerous patients. Comparative transcriptome studies have been previously conducted to analyze differentially expressed genes between LSC+ and LSC- cells. However, these studies mainly focused on a limited number of genes with the most obvious expression differences between the two cell types. We developed a computational approach incorporating several machine learning algorithms, including Monte Carlo feature selection (MCFS), incremental feature selection (IFS), support vector machine (SVM), Repeated Incremental Pruning to Produce Error Reduction (RIPPER), to identify gene expression features specific to LSCs. One thousand 0ne hudred fifty-nine features (genes) were first identified, which can be used to build the optimal SVM classifier for distinguishing LSC+ and LSC- cells. Among these 1159 genes, the top 17 genes were identified as LSC-specific biomarkers. In addition, six classification rules were produced by RIPPER algorithm. The subsequent literature review on these features/genes and the classification rules and functional enrichment analyses of the 1159 features/genes confirmed the relevance of extracted genes and rules to the characteristics of LSCs.
- Published
- 2019
- Full Text
- View/download PDF