Back to Search
Start Over
Systematic clustering of transcription start site landscapes
- Source :
- PLoS ONE, Vol 6, Iss 8, p e23409 (2011), PLoS ONE, Zhao, X, Valen, E, Parker, B J & Sandelin, A G 2011, ' Systematic clustering of transcription start site landscapes ', P L o S One, vol. 6, no. 8 . https://doi.org/10.1371/journal.pone.0023409
- Publication Year :
- 2011
- Publisher :
- Public Library of Science (PLoS), 2011.
-
Abstract
- Genome-wide, high-throughput methods for transcription start site (TSS) detection have shown that most promoters have an array of neighboring TSSs where some are used more than others, forming a distribution of initiation propensities. TSS distributions (TSSDs) vary widely between promoters and earlier studies have shown that the TSSDs have biological implications in both regulation and function. However, no systematic study has been made to explore how many types of TSSDs and by extension core promoters exist and to understand which biological features distinguish them. In this study, we developed a new non-parametric dissimilarity measure and clustering approach to explore the similarities and stabilities of clusters of TSSDs. Previous studies have used arbitrary thresholds to arrive at two general classes: broad and sharp. We demonstrated that in addition to the previous broad/sharp dichotomy an additional category of promoters exists. Unlike typical TATA-driven sharp TSSDs where the TSS position can vary a few nucleotides, in this category virtually all TSSs originate from the same genomic position. These promoters lack epigenetic signatures of typical mRNA promoters and a substantial subset of them are mapping upstream of ribosomal protein pseudogenes. We present evidence that these are likely mapping errors, which have confounded earlier analyses, due to the high similarity of ribosomal gene promoters in combination with known G addition bias in the CAGE libraries. Thus, previous two-class separations of promoter based on TSS distributions are motivated, but the ultra-sharp TSS distributions will confound downstream analyses if not removed.
- Subjects :
- Ribosomal Proteins
Pseudogene
Science
DNA transcription
Molecular Sequence Data
Sequence alignment
Computational biology
Biology
Biochemistry
Transcriptomes
Epigenesis, Genetic
03 medical and health sciences
Mice
0302 clinical medicine
Similarity (network science)
Ribosomal protein
Genome Analysis Tools
Databases, Genetic
Genetics
Nucleosome
Animals
Cluster Analysis
Humans
RNA synthesis
Cluster analysis
Promoter Regions, Genetic
030304 developmental biology
0303 health sciences
Multidisciplinary
Base Sequence
Computational Biology
Promoter
Genomics
Functional Genomics
Nucleic acids
Gene Expression Regulation
RNA
Medicine
Gene expression
Transcription Initiation Site
Genome Expression Analysis
030217 neurology & neurosurgery
Function (biology)
Pseudogenes
Research Article
Subjects
Details
- Language :
- English
- ISSN :
- 19326203
- Volume :
- 6
- Issue :
- 8
- Database :
- OpenAIRE
- Journal :
- PLoS ONE
- Accession number :
- edsair.doi.dedup.....84813bf3c4165da55927af907cb0b150
- Full Text :
- https://doi.org/10.1371/journal.pone.0023409