1. Machine learning-based microarray analyses indicate low-expression genes might collectively influence PAH disease
- Author
-
Qiang Wu, Jiangping Bai, James West, and Song Cui
- Subjects
0301 basic medicine ,Microarray ,Computer science ,Microarrays ,Gene Expression ,computer.software_genre ,Machine Learning ,0302 clinical medicine ,Mathematical and Statistical Techniques ,Databases, Genetic ,Feature (machine learning) ,Cluster Analysis ,Biology (General) ,Oligonucleotide Array Sequence Analysis ,Pulmonary Arterial Hypertension ,Ecology ,Artificial neural network ,Applied Mathematics ,Simulation and Modeling ,Bioassays and Physiological Analysis ,Computational Theory and Mathematics ,Modeling and Simulation ,Physical Sciences ,Supervised Machine Learning ,DNA microarray ,Algorithms ,Research Article ,Computer and Information Sciences ,QH301-705.5 ,Feature selection ,Machine learning ,Research and Analysis Methods ,03 medical and health sciences ,Cellular and Molecular Neuroscience ,Machine Learning Algorithms ,Text mining ,Artificial Intelligence ,Support Vector Machines ,Genetics ,Humans ,Hierarchical Clustering ,Molecular Biology ,Ecology, Evolution, Behavior and Systematics ,Artificial Neural Networks ,Computational Neuroscience ,Models, Genetic ,business.industry ,Gene Expression Profiling ,Computational Biology ,Biology and Life Sciences ,Hierarchical clustering ,Support vector machine ,030104 developmental biology ,Case-Control Studies ,Mutation ,Artificial intelligence ,business ,computer ,030217 neurology & neurosurgery ,Mathematics ,Neuroscience - Abstract
Accurately predicting and testing the types of Pulmonary arterial hypertension (PAH) of each patient using cost-effective microarray-based expression data and machine learning algorithms could greatly help either identifying the most targeting medicine or adopting other therapeutic measures that could correct/restore defective genetic signaling at the early stage. Furthermore, the prediction model construction processes can also help identifying highly informative genes controlling PAH, leading to enhanced understanding of the disease etiology and molecular pathways. In this study, we used several different gene filtering methods based on microarray expression data obtained from a high-quality patient PAH dataset. Following that, we proposed a novel feature selection and refinement algorithm in conjunction with well-known machine learning methods to identify a small set of highly informative genes. Results indicated that clusters of small-expression genes could be extremely informative at predicting and differentiating different forms of PAH. Additionally, our proposed novel feature refinement algorithm could lead to significant enhancement in model performance. To summarize, integrated with state-of-the-art machine learning and novel feature refining algorithms, the most accurate models could provide near-perfect classification accuracies using very few (close to ten) low-expression genes., Author summary Pulmonary arterial hypertension (PAH) is a serious and progressive disease, with only a roughly 50% of 5-year survival rate even with best available therapies. Accurately detecting/differentiating different forms of PAH and developing drugs that could directly target at genes involved in PAH pathogenesis are essential. We proposed a computational approach using low-cost microarray data collected from a clinical trial and had accurately predicted each PAH group. In particular, we considered the fact that there might exist some low-expression genes that were usually discarded by researchers but might function collectively and significantly controlling the disease in each case. Therefore, we had developed different filtering algorithms that intentionally selected those low-expression genes for constructing prediction model. Using a few highly informative low-expression genes that had never been extensively investigated before, our systematic approach had produced models that could offer prefect accuracy in predicting PAH. Additionally, our analysis also found that the composition of gene factors controlling the PAH etiology under each form are quite different from each other.
- Published
- 2019