Back to Search
Start Over
Combinatorial epigenetic patterns as quantitative predictors of chromatin biology
- Source :
- BMC Genomics
- Publication Year :
- 2014
- Publisher :
- BioMed Central, 2014.
-
Abstract
- Chromatin immunoprecipitation followed by deep sequencing (ChIP-seq) is the most widely used method for characterizing the epigenetic states of chromatin on a genomic scale. With the recent availability of large genome-wide data sets, often comprising several epigenetic marks, novel approaches are required to explore functionally relevant interactions between histone modifications. Computational discovery of "chromatin states" defined by such combinatorial interactions enabled descriptive annotations of genomes, but more quantitative approaches are needed to progress towards predictive models. We propose non-negative matrix factorization (NMF) as a new unsupervised method to discover combinatorial patterns of epigenetic marks that frequently co-occur in subsets of genomic regions. We show that this small set of combinatorial "codes" can be effectively displayed and interpreted. NMF codes enable dimensionality reduction and have desirable statistical properties for regression and classification tasks. We demonstrate the utility of codes in the quantitative prediction of Pol2-binding and the discrimination between Pol2-bound promoters and enhancers. Finally, we show that specific codes can be linked to molecular pathways and targets of pluripotency genes during differentiation. We have introduced and evaluated a new computational approach to represent combinatorial patterns of epigenetic marks as quantitative variables suitable for predictive modeling and supervised machine learning. To foster widespread adoption of this method we make it available as an open-source software-package – epicode at https://github.com/mcieslik-mctp/epicode .
- Subjects :
- Epigenomics
Chromatin Immunoprecipitation
Computational biology
Biology
Genome
Deep sequencing
Non-negative matrix factorization
Histones
User-Computer Interface
Genetics
Humans
Epigenetics
Promoter Regions, Genetic
Embryonic Stem Cells
Internet
Principal Component Analysis
Dimensionality reduction
Methodology Article
Chromatin
ROC Curve
Area Under Curve
DNA microarray
Chromatin immunoprecipitation
Algorithms
Biotechnology
Protein Binding
Subjects
Details
- Language :
- English
- ISSN :
- 14712164
- Volume :
- 15
- Database :
- OpenAIRE
- Journal :
- BMC Genomics
- Accession number :
- edsair.doi.dedup.....89a10ec1c4073389dfa1429ab8f88034