Back to Search
Start Over
Tree-Based Position Weight Matrix Approach to Model Transcription Factor Binding Site Profiles
- Source :
- PLoS ONE, PLoS ONE, Vol 6, Iss 9, p e24210 (2011)
- Publication Year :
- 2011
- Publisher :
- Public Library of Science, 2011.
-
Abstract
- Most of the position weight matrix (PWM) based bioinformatics methods developed to predict transcription factor binding sites (TFBS) assume each nucleotide in the sequence motif contributes independently to the interaction between protein and DNA sequence, usually producing high false positive predictions. The increasing availability of TF enrichment profiles from recent ChIP-Seq methodology facilitates the investigation of dependent structure and accurate prediction of TFBSs. We develop a novel Tree-based PWM (TPWM) approach to accurately model the interaction between TF and its binding site. The whole tree-structured PWM could be considered as a mixture of different conditional-PWMs. We propose a discriminative approach, called TPD (TPWM based Discriminative Approach), to construct the TPWM from the ChIP-Seq data with a pre-existing PWM. To achieve the maximum discriminative power between the positive and negative datasets, the cutoff value is determined based on the Matthew Correlation Coefficient (MCC). The resulting TPWMs are evaluated with respect to accuracy on extensive synthetic datasets. We then apply our TPWM discriminative approach on several real ChIP-Seq datasets to refine the current TFBS models stored in the TRANSFAC database. Experiments on both the simulated and real ChIP-Seq data show that the proposed method starting from existing PWM has consistently better performance than existing tools in detecting the TFBSs. The improved accuracy is the result of modelling the complete dependent structure of the motifs and better prediction of true positive rate. The findings could lead to better understanding of the mechanisms of TF-DNA interactions.
- Subjects :
- Chromatin Immunoprecipitation
Correlation coefficient
DNA transcription
lcsh:Medicine
Biology
Markov model
Biochemistry
Transcriptomes
03 medical and health sciences
Molecular cell biology
Discriminative model
Genome Analysis Tools
DNA-binding proteins
Genetics
Position-Specific Scoring Matrices
Nucleotide Motifs
lcsh:Science
030304 developmental biology
0303 health sciences
Multidisciplinary
Binding Sites
Base Sequence
business.industry
030302 biochemistry & molecular biology
lcsh:R
Proteins
Computational Biology
Pattern recognition
Genomics
Position weight matrix
DNA binding site
Tree (data structure)
lcsh:Q
Artificial intelligence
Gene expression
TRANSFAC
business
Sequence motif
Sequence Analysis
Research Article
Transcription Factors
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 6
- Issue :
- 9
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....cd5157820f79ac57452daac2b4599588