Back to Search
Start Over
Features that define the best ChIP-seq peak calling algorithms
- Source :
- Briefings in Bioinformatics
- Publication Year :
- 2016
- Publisher :
- Oxford University Press (OUP), 2016.
-
Abstract
- Chromatin immunoprecipitation followed by sequencing (ChIP-seq) is an important tool for studying gene regulatory proteins, such as transcription factors and histones. Peak calling is one of the first steps in the analysis of these data. Peak calling consists of two sub-problems: identifying candidate peaks and testing candidate peaks for statistical significance. We surveyed 30 methods and identified 12 features of the two sub-problems that distinguish methods from each other. We picked six methods GEM, MACS2, MUSIC, BCP, Threshold-based method (TM) and ZINBA] that span this feature space and used a combination of 300 simulated ChIP-seq data sets, 3 real data sets and mathematical analyses to identify features of methods that allow some to perform better than the others. We prove that methods that explicitly combine the signals from ChIP and input samples are less powerful than methods that do not. Methods that use windows of different sizes are more powerful than the ones that do not. For statistical testing of candidate peaks, methods that use a Poisson test to rank their candidate peaks are more powerful than those that use a Binomial test. BCP and MACS2 have the best operating characteristics on simulated transcription factor binding data. GEM has the highest fraction of the top 500 peaks containing the binding motif of the immunoprecipitated factor, with 50% of its peaks within 10 base pairs of a motif. BCP and MUSIC perform best on histone data. These findings provide guidance and rationale for selecting the best peak caller for a given application.
- Subjects :
- 0301 basic medicine
Chromatin Immunoprecipitation
Binding Sites
Computer science
Feature vector
High-Throughput Nucleotide Sequencing
peak caller
Binomial test
Sequence Analysis, DNA
Chip
Histones
ChIP-seq
03 medical and health sciences
benchmark
030104 developmental biology
Papers
Molecular Biology
Algorithm
Peak calling
Algorithms
Oligonucleotide Array Sequence Analysis
Transcription Factors
Information Systems
Statistical hypothesis testing
Subjects
Details
- ISSN :
- 14774054 and 14675463
- Database :
- OpenAIRE
- Journal :
- Briefings in Bioinformatics
- Accession number :
- edsair.doi.dedup.....2a247b2546567784ef030ee09bee5176