1. Deciphering Transcriptional Regulatory Circuits: Transcription Factor Binding and Regulatory Variants Identification
- Author
-
Ouyang, Ningxin
- Subjects
- Transcriptional Regulatory, Transcription Factor Binding Site, Regulatory Variants
- Abstract
Transcription factors can bind cis-regulatory DNA elements to achieve their regulatory properties. Identification of transcription factor binding sites remains a crucial goal in deciphering transcriptional regulatory circuits. The vast majority of genetic variants identified from whole genome sequencing studies and leading disease-causing SNPs implicated in genome-wide association studies (GWAS) lie well outside of protein coding regions. The functional effect of variants within non-coding sequence is often through creation or disruption of individual transcription factor binding that alters downstream gene regulatory activity. In addition, transcription factors can act cooperatively to regulate transcription in a context-specific manner. Some binding complexes, such as CTCF together with cohesin proteins, can also mediate 3D chromatin interactions that also have downstream gene regulatory control. My dissertation is focused on deciphering these interactions driving gene regulatory circuits. In this dissertation, I develop an improved footprinting algorithm to map transcription factor binding sites genome-wide, study regulatory variants associated with transcription factor binding affinity, and explore transcription factor cooperativity and their role in 3D chromatin interaction. In Chapter 2, I will introduce the TRACE algorithm, a multi-threaded computational footprinting method to predict transcription factor binding sites, using chromatin accessibility data (DNase-seq or ATAC-seq) and sequence information. In the development of the method, I Implemented a multivariate hidden Markov model (HMM) in an unsupervised training manner for identifying and labeling DNase footprints. TRACE exhibited the best overall performance among all existing footprinting methods after a comprehensive evaluation. In Chapter 3, I investigated the association between genetic variants and transcription factor binding activity to identify footprint QTLs (fpQTLs) at a base pair resolution, contributing to a better knowledge of the mechanism behind the linkage between genotypic variation and gene regulation as well as disease phenotypes. Overall, detection of fpQTLs provides additional information for a more complete characterization of the landscape of human regulatory variation and its direct effect on gene expression. In Chapter 4, I employed an artificial neural network called Self-Organizing Maps (SOMs) to identify “clusters” of transcription factors and define co-binding patterns. I specifically examined the transcription factor enrichment at chromatin loop anchors and studied how they might modulate downstream looping effects. Together, the studies in this dissertation provide improved transcription factor binding site prediction, deliver improved functional interpretation of noncoding variation, and expand our knowledge on transcription factor cooperativity and their effect on 3D organization of chromatin.
- Published
- 2022