Back to Search
Start Over
Sparse Bayesian classification and feature selection for biological expression data with high correlations
- Source :
- PLoS ONE, PLoS ONE, Vol 12, Iss 12, p e0189541 (2017), PLoS ONE, 12(12)
- Publication Year :
- 2017
- Publisher :
- Public Library of Science, 2017.
-
Abstract
- Classification models built on biological expression data are increasingly used to predict distinct disease subtypes. Selected features that separate sample groups can be the candidates of biomarkers, helping us to discover biological functions/pathways. However, three challenges are associated with building a robust classification and feature selection model: 1) the number of significant biomarkers is much smaller than that of measured features for which the search will be exhaustive; 2) current biological expression data are big in both sample size and feature size which will worsen the scalability of any search algorithms; and 3) expression profiles of certain features are typically highly correlated which may prevent to distinguish the predominant features. Unfortunately, most of the existing algorithms are partially addressing part of these challenges but not as a whole. In this paper, we propose a unified framework to address the above challenges. The classification and feature selection problem is first formulated as a nonconvex optimisation problem. Then the problem is relaxed and solved iteratively by a sequence of convex optimisation procedures which can be distributed computed and therefore allows the efficient implementation on advanced infrastructures. To illustrate the competence of our method over others, we first analyse a randomly generated simulation dataset under various conditions. We then analyse a real gene expression dataset on embryonal tumour. Further downstream analysis, such as functional annotation and pathway analysis, are performed on the selected features which elucidate several biological findings.
- Subjects :
- 0301 basic medicine
Optimization
Computer and Information Sciences
Computer science
lcsh:Medicine
Gene Expression
Feature selection
Machine learning
computer.software_genre
Research and Analysis Methods
Biochemistry
Machine Learning
03 medical and health sciences
Naive Bayes classifier
Mathematical and Statistical Techniques
Search algorithm
Artificial Intelligence
Genetics
Humans
Statistical Methods
lcsh:Science
Multidisciplinary
business.industry
Gene Ontologies
Applied Mathematics
Simulation and Modeling
Gene Expression Profiling
lcsh:R
Biology and Life Sciences
Computational Biology
Bayes Theorem
Genomics
Genome Analysis
030104 developmental biology
Sample size determination
Expression data
Physical Sciences
lcsh:Q
Artificial intelligence
business
computer
Mathematics
Statistics (Mathematics)
Biomarkers
Algorithms
Research Article
Forecasting
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 12
- Issue :
- 12
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....5eb7ff072a8c0870be3dcb28cd37e940