Back to Search Start Over

DNA sequence models of genome-wide Drosophila melanogaster Polycomb binding sites improve generalization to independent Polycomb Response Elements.

Authors :
Bredesen BA
Rehmsmeier M
Source :
Nucleic acids research [Nucleic Acids Res] 2019 Sep 05; Vol. 47 (15), pp. 7781-7797.
Publication Year :
2019

Abstract

Polycomb Response Elements (PREs) are cis-regulatory DNA elements that maintain gene transcription states through DNA replication and mitosis. PREs have little sequence similarity, but are enriched in a number of sequence motifs. Previous methods for modelling Drosophila melanogaster PRE sequences (PREdictor and EpiPredictor) have used a set of 7 motifs and a training set of 12 PREs and 16-23 non-PREs. Advances in experimental methods for mapping chromatin binding factors and modifications has led to the publication of several genome-wide sets of Polycomb targets. In addition to the seven motifs previously used, PREs are enriched in the GTGT motif, recently associated with the sequence-specific DNA binding protein Combgap. We investigated whether models trained on genome-wide Polycomb sites generalize to independent PREs when trained with control sequences generated by naive PRE models and including the GTGT motif. We also developed a new PRE predictor: SVM-MOCCA. Training PRE predictors with genome-wide experimental data improves generalization to independent data, and SVM-MOCCA predicts the majority of PREs in three independent experimental sets. We present 2908 candidate PREs enriched in sequence and chromatin signatures. 2412 of these are also enriched in H3K4me1, a mark of Trithorax activated chromatin, suggesting that PREs/TREs have a common sequence code.<br /> (© The Author(s) 2019. Published by Oxford University Press on behalf of Nucleic Acids Research.)

Details

Language :
English
ISSN :
1362-4962
Volume :
47
Issue :
15
Database :
MEDLINE
Journal :
Nucleic acids research
Publication Type :
Academic Journal
Accession number :
31340029
Full Text :
https://doi.org/10.1093/nar/gkz617