Back to Search
Start Over
A new approach for interpreting Random Forest models and its application to the biology of ageing
- Source :
- Bioinformatics, BIOINFORMATICS
- Publication Year :
- 2017
-
Abstract
- Motivation This work uses the Random Forest (RF) classification algorithm to predict if a gene is over-expressed, under-expressed or has no change in expression with age in the brain. RFs have high predictive power, and RF models can be interpreted using a feature (variable) importance measure. However, current feature importance measures evaluate a feature as a whole (all feature values). We show that, for a popular type of biological data (Gene Ontology-based), usually only one value of a feature is particularly important for classification and the interpretation of the RF model. Hence, we propose a new algorithm for identifying the most important and most informative feature values in an RF model. Results The new feature importance measure identified highly relevant Gene Ontology terms for the aforementioned gene classification task, producing a feature ranking that is much more informative to biologists than an alternative, state-of-the-art feature importance measure. Availability and implementation The dataset and source codes used in this paper are available as ‘Supplementary Material’ and the description of the data can be found at: https://fabiofabris.github.io/bioinfo2018/web/. Supplementary information Supplementary data are available at Bioinformatics online.
- Subjects :
- 0301 basic medicine
Statistics and Probability
Aging
Source code
media_common.quotation_subject
Machine learning
computer.software_genre
Biochemistry
Machine Learning
03 medical and health sciences
0302 clinical medicine
Software
Animals
Humans
Q335
Molecular Biology
media_common
Measure (data warehouse)
Biological data
business.industry
Brain
Computational Biology
Original Papers
Expression (mathematics)
Computer Science Applications
Random forest
Computational Mathematics
Variable (computer science)
030104 developmental biology
Gene Ontology
Computational Theory and Mathematics
Gene Expression Regulation
Feature (computer vision)
Artificial intelligence
Data and Text Mining
business
computer
030217 neurology & neurosurgery
Subjects
Details
- ISSN :
- 13674811 and 13674803
- Volume :
- 34
- Issue :
- 14
- Database :
- OpenAIRE
- Journal :
- Bioinformatics (Oxford, England)
- Accession number :
- edsair.doi.dedup.....6eff2e6a05d6a1be9ca3c3560defab04