Back to Search
Start Over
GOexpress: an R/Bioconductor package for the identification and visualisation of robust gene ontology signatures through supervised learning of gene expression data
- Source :
- BMC Bioinformatics
- Publication Year :
- 2016
- Publisher :
- BioMed Central, 2016.
-
Abstract
- Background Identification of gene expression profiles that differentiate experimental groups is critical for discovery and analysis of key molecular pathways and also for selection of robust diagnostic or prognostic biomarkers. While integration of differential expression statistics has been used to refine gene set enrichment analyses, such approaches are typically limited to single gene lists resulting from simple two-group comparisons or time-series analyses. In contrast, functional class scoring and machine learning approaches provide powerful alternative methods to leverage molecular measurements for pathway analyses, and to compare continuous and multi-level categorical factors. Results We introduce GOexpress, a software package for scoring and summarising the capacity of gene ontology features to simultaneously classify samples from multiple experimental groups. GOexpress integrates normalised gene expression data (e.g., from microarray and RNA-seq experiments) and phenotypic information of individual samples with gene ontology annotations to derive a ranking of genes and gene ontology terms using a supervised learning approach. The default random forest algorithm allows interactions between all experimental factors, and competitive scoring of expressed genes to evaluate their relative importance in classifying predefined groups of samples. Conclusions GOexpress enables rapid identification and visualisation of ontology-related gene panels that robustly classify groups of samples and supports both categorical (e.g., infection status, treatment) and continuous (e.g., time-series, drug concentrations) experimental factors. The use of standard Bioconductor extension packages and publicly available gene ontology annotations facilitates straightforward integration of GOexpress within existing computational biology pipelines. Electronic supplementary material The online version of this article (doi:10.1186/s12859-016-0971-3) contains supplementary material, which is available to authorized users.
- Subjects :
- 0301 basic medicine
Microarray
Computer science
Bioinformatics
RNA-sequencing
Single gene
Computational biology
computer.software_genre
Biochemistry
Transcriptome
Bioconductor
03 medical and health sciences
0302 clinical medicine
Structural Biology
Gene expression
Machine learning
Leverage (statistics)
RNA, Messenger
Categorical variable
Gene
Molecular Biology
01 Mathematical Sciences
08 Information And Computing Sciences
Applied Mathematics
Supervised learning
Statistics
Computational Biology
Functional genomics
06 Biological Sciences
Classification
Random forest
Computer Science Applications
Gene expression profiling
030104 developmental biology
Gene Ontology
030220 oncology & carcinogenesis
Biomarker (medicine)
Gene ontology
Data mining
Supervised Machine Learning
DNA microarray
computer
Software
Subjects
Details
- Language :
- English
- ISSN :
- 14712105
- Volume :
- 17
- Database :
- OpenAIRE
- Journal :
- BMC Bioinformatics
- Accession number :
- edsair.doi.dedup.....a2544018b7215e584c1006d503008a39