Back to Search Start Over

Evidence of transcription at polyT short tandem repeats

Authors :
Chloé Bessière
Yoshihide Hayashizaki
Laurent Brehelin
Jordan A. Ramilowski
Manu Saraswat
Mathys Grapotte
Charles-Henri Lecellier
Michiel J. L. de Hoon
Masayoshi Itoh
Akira Hasegawa
Wyeth W. Wasserman
Jessica Severin
Christophe Menichelli
Piero Carninci
Harukazu Suzuki
Institut de Biologie Computationnelle (IBC)
Institut National de Recherche en Informatique et en Automatique (Inria)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Institut National de Recherche pour l’Agriculture, l’Alimentation et l’Environnement (INRAE)
RIKEN Center for Integrative Medical Sciences [Yokohama] (RIKEN IMS)
RIKEN - Institute of Physical and Chemical Research [Japon] (RIKEN)
University of British Columbia (UBC)
Méthodes et Algorithmes pour la Bioinformatique (MAB)
Laboratoire d'Informatique de Robotique et de Microélectronique de Montpellier (LIRMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Institut de Génétique Moléculaire de Montpellier (IGMM)
Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)-Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Université de Montpellier (UM)-Centre National de la Recherche Scientifique (CNRS)
Publication Year :
2021
Publisher :
HAL CCSD, 2021.

Abstract

BackgroundUsing the Cap Analysis of Gene Expression technology, the FANTOM5 consortium provided one of the most comprehensive maps of Transcription Start Sites (TSSs) in several species. Strikingly, ~72% of them could not be assigned to a specific gene and initiate at unconventional regions, outside promoters or enhancers.ResultsHere, we probe these unassigned TSSs and show that, in all species studied, a significant fraction of CAGE peaks initiate at short tandem repeats (STRs) corresponding to homopolymers of thymidines (T). Additional analyse confirm that these CAGEs are truly associated with transcriptionally active chromatin marks. Furthermore, we train a sequence-based deep learning model able to predict CAGE signal at T STRs with high accuracy (~81%) Extracting features learned by this model reveals that transcription at T STRs is mostly directed by STR length but also instructions lying in the downstream sequence. Excitingly, our model also predicts that genetic variants linked to human diseases affect this STR-associated transcription.ConclusionsTogether, our results extend the repertoire of non-coding transcription associated with DNA tandem repeats and complexify STR polymorphism. We also provide a new metric that can be considered in future studies of STR-related complex traits.

Details

Language :
English
Database :
OpenAIRE
Accession number :
edsair.doi.dedup.....25390ff2bc0d13dbcf9aec51d8dc062f