Back to Search
Start Over
DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.
- Source :
-
Nucleic acids research [Nucleic Acids Res] 2019 Sep 05; Vol. 47 (15), pp. 7781-7797. - Publication Year :
- 2019
-
Abstract
- Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.<br /> (© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.)
- Subjects :
- Animals
Binding Sites
Chromatin chemistry
Chromatin metabolism
Chromosomal Proteins, Non-Histone genetics
Chromosomal Proteins, Non-Histone metabolism
DNA chemistry
DNA metabolism
Drosophila Proteins genetics
Drosophila Proteins metabolism
Drosophila melanogaster metabolism
Embryo, Nonmammalian
Gene Ontology
Histones genetics
Histones metabolism
Larva genetics
Larva metabolism
Molecular Sequence Annotation
Nucleotide Motifs
Polycomb-Group Proteins metabolism
Protein Binding
Software
Transcription Factors genetics
Transcription Factors metabolism
Algorithms
DNA genetics
Drosophila melanogaster genetics
Genome, Insect
Polycomb-Group Proteins genetics
Response Elements
Subjects
Details
- Language :
- English
- ISSN :
- 1362-4962
- Volume :
- 47
- Issue :
- 15
- Database :
- MEDLINE
- Journal :
- Nucleic acids research
- Publication Type :
- Academic Journal
- Accession number :
- 31340029
- Full Text :
- https://doi.org/10.1093/nar/gkz617