Author: "Matthew Stephens" - Searchworks@Jio Institute Digital Library Search Results

1. GoM DE: interpreting structure in sequence count data with differential expression analysis allowing for grades of membership

Author: Peter Carbonetto, Kaixuan Luo, Abhishek Sarkar, Anthony Hung, Karl Tayeb, Sebastian Pott, and Matthew Stephens
Subjects: Gene expression, Single-cell RNA-seq, Single-cell ATAC-seq, Differential expression analysis, Dimensionality reduction, Parts-based representations, Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract Parts-based representations, such as non-negative matrix factorization and topic modeling, have been used to identify structure from single-cell sequencing data sets, in particular structure that is not as well captured by clustering or other dimensionality reduction methods. However, interpreting the individual parts remains a challenge. To address this challenge, we extend methods for differential expression analysis by allowing cells to have partial membership to multiple groups. We call this grade of membership differential expression (GoM DE). We illustrate the benefits of GoM DE for annotating topics identified in several single-cell RNA-seq and ATAC-seq data sets.
Published: 2023
Full Text: View/download PDF

2. Fine-mapping studies distinguish genetic risks for childhood- and adult-onset asthma in the HLA region

Author: Selene M. Clay, Nathan Schoettler, Andrew M. Goldstein, Peter Carbonetto, Matthew Dapas, Matthew C. Altman, Mario G. Rosasco, James E. Gern, Daniel J. Jackson, Hae Kyung Im, Matthew Stephens, Dan L. Nicolae, and Carole Ober
Subjects: Asthma, HLA, Fine-mapping, Medicine, Genetics, QH426-470
Abstract: Abstract Background Genome-wide association studies of asthma have revealed robust associations with variation across the human leukocyte antigen (HLA) complex with independent associations in the HLA class I and class II regions for both childhood-onset asthma (COA) and adult-onset asthma (AOA). However, the specific variants and genes contributing to risk are unknown. Methods We used Bayesian approaches to perform genetic fine-mapping for COA and AOA (n=9432 and 21,556, respectively; n=318,167 shared controls) in White British individuals from the UK Biobank and to perform expression quantitative trait locus (eQTL) fine-mapping in immune (lymphoblastoid cell lines, n=398; peripheral blood mononuclear cells, n=132) and airway (nasal epithelial cells, n=188) cells from ethnically diverse individuals. We also examined putatively causal protein coding variation from protein crystal structures and conducted replication studies in independent multi-ethnic cohorts from the UK Biobank (COA n=1686; AOA n=3666; controls n=56,063). Results Genetic fine-mapping revealed both shared and distinct causal variation between COA and AOA in the class I region but only distinct causal variation in the class II region. Both gene expression levels and amino acid variation contributed to risk. Our results from eQTL fine-mapping and amino acid visualization suggested that the HLA-DQA1*03:01 allele and variation associated with expression of the nonclassical HLA-DQA2 and HLA-DQB2 genes accounted entirely for the most significant association with AOA in GWAS. Our studies also suggested a potentially prominent role for HLA-C protein coding variation in the class I region in COA. We replicated putatively causal variant associations in a multi-ethnic cohort. Conclusions We highlight roles for both gene expression and protein coding variation in asthma risk and identified putatively causal variation and genes in the HLA region. A convergence of genomic, transcriptional, and protein coding evidence implicates the HLA-DQA2 and HLA-DQB2 genes and HLA-DQA1*03:01 allele in AOA.
Published: 2022
Full Text: View/download PDF

3. Fine-mapping from summary data with the 'Sum of Single Effects' model.

Author: Yuxin Zou, Peter Carbonetto, Gao Wang, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: In recent work, Wang et al introduced the "Sum of Single Effects" (SuSiE) model, and showed that it provides a simple and efficient approach to fine-mapping genetic variants from individual-level data. Here we present new methods for fitting the SuSiE model to summary data, for example to single-SNP z-scores from an association study and linkage disequilibrium (LD) values estimated from a suitable reference panel. To develop these new methods, we first describe a simple, generic strategy for extending any individual-level data method to deal with summary data. The key idea is to replace the usual regression likelihood with an analogous likelihood based on summary data. We show that existing fine-mapping methods such as FINEMAP and CAVIAR also (implicitly) use this strategy, but in different ways, and so this provides a common framework for understanding different methods for fine-mapping. We investigate other common practical issues in fine-mapping with summary data, including problems caused by inconsistencies between the z-scores and LD estimates, and we develop diagnostics to identify these inconsistencies. We also present a new refinement procedure that improves model fits in some data sets, and hence improves overall reliability of the SuSiE fine-mapping results. Detailed evaluations of fine-mapping methods in a range of simulated data sets show that SuSiE applied to summary data is competitive, in both speed and accuracy, with the best available fine-mapping methods for summary data.
Published: 2022
Full Text: View/download PDF

4. Exploiting the GTEx resources to decipher the mechanisms at GWAS loci

Author: Alvaro N. Barbeira, Rodrigo Bonazzola, Eric R. Gamazon, Yanyu Liang, YoSon Park, Sarah Kim-Hellmuth, Gao Wang, Zhuoxun Jiang, Dan Zhou, Farhad Hormozdiari, Boxiang Liu, Abhiram Rao, Andrew R. Hamel, Milton D. Pividori, François Aguet, GTEx GWAS Working Group, Lisa Bastarache, Daniel M. Jordan, Marie Verbanck, Ron Do, GTEx Consortium, Matthew Stephens, Kristin Ardlie, Mark McCarthy, Stephen B. Montgomery, Ayellet V. Segrè, Christopher D. Brown, Tuuli Lappalainen, Xiaoquan Wen, and Hae Kyung Im
Subjects: Biology (General), QH301-705.5, Genetics, QH426-470
Abstract: Abstract The resources generated by the GTEx consortium offer unprecedented opportunities to advance our understanding of the biology of human diseases. Here, we present an in-depth examination of the phenotypic consequences of transcriptome regulation and a blueprint for the functional interpretation of genome-wide association study-discovered loci. Across a broad set of complex traits and diseases, we demonstrate widespread dose-dependent effects of RNA expression and splicing. We develop a data-driven framework to benchmark methods that prioritize causal genes and find no single approach outperforms the combination of multiple approaches. Using colocalization and association approaches that take into account the observed allelic heterogeneity of gene expression, we propose potential target genes for 47% (2519 out of 5385) of the GWAS loci examined.
Published: 2021
Full Text: View/download PDF

5. Ultra-purification of Lipopolysaccharides reveals species-specific signalling bias of TLR4: importance in macrophage function

Author: Matthew Stephens, Shan Liao, and Pierre-Yves von der Weid
Subjects: Medicine, Science
Abstract: Abstract TLR4 location, and bacterial species-derived lipopolysaccharides, play a significant role in the downstream activation of transcription factors, accessory molecules, and products. Here, this is demonstrated through the use of classically-activated and alternatively-activated macrophages. We show that, when polarized, human macrophages differentially express and localize TLR4, resulting in biased recognition and subsequent signalling of LPS derived from Pseudomonas aeruginosa, Escherichia coli, and Salmonella enterica. Analysis of activation demonstrated that in classically activated macrophages, P. aeruginosa signals from the plasma membrane via TLR4 to p65 dependent on TAK1 and TBK1 signalling. E. coli signals dependent or independent of the endosome, utilizing both TAK1- and TBK1-signalling to induce P65 and IRF3 inducible genes and cytokines. S. enterica however, only induces P65 and IRF3 phosphorylation through signalling via the endosome. This finding outlines clear signalling mechanisms by which innate immune cells, such as macrophages, can distinguish between bacterial species and initiate specialized responses through TLR4.
Published: 2021
Full Text: View/download PDF

6. Tumor-Draining Lymph Node Reconstruction Promotes B Cell Activation During E0771 Mouse Breast Cancer Growth

Author: Dante Alexander Patrick Louie, Darellynn Oo, Glory Leung, Yujia Lin, Matthew Stephens, Omar Alrashed, Marcus Tso, and Shan Liao
Subjects: Tumor-draining lymph node, B cell, breast cancer, subcapular sinus macrophage, germinal center (GC), tumor-associated antigen (TAA), Therapeutics. Pharmacology, RM1-950
Abstract: Lymph node metastasis is associated with tumor aggressiveness and poor prognosis in patients. Despite its significance in cancer progression, how immune cells in the tumor-draining lymph node (TDLN) participate in cancer immune regulation remains poorly understood. It has been reported that both anti-tumor and exhausted tumor-specific T cells can be induced in the TDLNs; however, B cell activation and maturation in the TDLN has received far less attention. In our studies using C57BL/6 mouse syngeneic E0771 breast cancer or B16F10 melanoma cell lines, tumor-associated antigens were found colocalized with the follicular dendritic cells (FDCs) in the germinal centers (GCs), where antigen-specific B cell maturation occurs. LN conduits and the subcapsular sinus (SCS) macrophages are two major routes of antigen trafficking to FDCs. Tumor growth induced LN conduit expansion in the B cell zone and disrupted the SCS macrophage layer, facilitating both the entry of tumor-associated antigens into the B cell zone and access to FDCs located in the GCs. Regional delivery of clodronate liposome specifically depleted SCS macrophages in the TDLN, increasing GC formation, and promoting tumor growth. Our study suggests that TDLN reconstruction creates a niche that favors B cell activation and maturation during tumor growth.
Published: 2022
Full Text: View/download PDF

7. Lipopolysaccharides modulate intestinal epithelial permeability and inflammation in a species-specific manner

Author: Matthew Stephens and Pierre-Yves von der Weid
Subjects: lipopolysaccharides, endotoxin, toll- like receptor 4, epithelium, inflammation, Diseases of the digestive system. Gastroenterology, RC799-869
Abstract: Patients presenting with Inflammatory bowel disease have been shown to exhibit an altered microbiome in both Crohn’s disease and Ulcerative colitis. This shift in the microbial content led us to question whether several of these microbes are important in inflammatory processes present in these diseases and more specifically whether lipopolysaccharides from the gram-negative cell wall differentially stimulates resident cells. We, therefore, investigated the possible contribution of five major species of gram-negative bacteria found to be altered in presence during disease progression and evaluate their pathogenicity through LPS. We demonstrated that LPS from these different species had individual capacities to induce NF-κB and pro-inflammatory IL-8 production from HEK-TLR4 cells in a TLR4 dependent manner. Additional work using human intestinal colonic epithelial cell monolayers (Caco-2) demonstrated that the cells responded to the serotype specific LPS in a distinct manner, inducing many inflammatory mediators such as TNF-α and IL-10 in significantly altered proportions. Furthermore, the permeability of Caco-2 monolayers, as a test for their ability to alter intestinal permeability, was also differentially altered by the serotype specific LPS modulating trans-epithelial electrical resistance, small molecule movement, and tight junction integrity. Our results suggest that specific species of bacteria may be potentiating the pathogenesis of IBD and chronic inflammatory diseases through their serotype specific LPS responses.
Published: 2020
Full Text: View/download PDF

8. Detailed modeling of positive selection improves detection of cancer driver genes

Author: Siming Zhao, Jun Liu, Pranav Nanga, Yuwen Liu, A. Ercument Cicek, Nicholas Knoblauch, Chuan He, Matthew Stephens, and Xin He
Subjects: Science
Abstract: Finding driver genes sheds lights on the biological mechanisms propelling the development of a tumour, and can suggest therapeutic strategies. Here, the authors develop driverMAPS, a model-based approach to identify driver genes, and apply it to TCGA datasets.
Published: 2019
Full Text: View/download PDF

9. Regional influences on community structure across the tropical-temperate divide

Author: Alexander E. White, Kushal K. Dey, Dhananjai Mohan, Matthew Stephens, and Trevor D. Price
Subjects: Science
Abstract: Multiple drivers maintain unique species assemblages at multiple biogeographic scales. Here, the authors show that the freezing line is a key barrier generating evolutionary differences in temperate and tropical bird communities across a steep elevational gradient in the Himalaya.
Published: 2019
Full Text: View/download PDF

10. A new sequence logo plot to highlight enrichment and depletion

Author: Kushal K. Dey, Dongyue Xie, and Matthew Stephens
Subjects: Logo plots, Enrichment depletion, EDLogo, String symbols, Computer applications to medicine. Medical informatics, R858-859.7, Biology (General), QH301-705.5
Abstract: Abstract Background Sequence logo plots have become a standard graphical tool for visualizing sequence motifs in DNA, RNA or protein sequences. However standard logo plots primarily highlight enrichment of symbols, and may fail to highlight interesting depletions. Current alternatives that try to highlight depletion often produce visually cluttered logos. Results We introduce a new sequence logo plot, the EDLogo plot, that highlights both enrichment and depletion, while minimizing visual clutter. We provide an easy-to-use and highly customizable R package Logolas to produce a range of logo plots, including EDLogo plots. This software also allows elements in the logo plot to be strings of characters, rather than a single character, extending the range of applications beyond the usual DNA, RNA or protein sequences. And the software includes new Empirical Bayes methods to stabilize estimates of enrichment and depletion, and thus better highlight the most significant patterns in data. We illustrate our methods and software on applications to transcription factor binding site motifs, protein sequence alignments and cancer mutation signature profiles. Conclusions Our new EDLogo plots and flexible software implementation can help data analysts visualize both enrichment and depletion of characters (DNA sequence bases, amino acids, etc.) across a wide range of applications.
Published: 2018
Full Text: View/download PDF

11. GaNCH: Using Linked Open Data for Georgia’s Natural, Cultural and Historic Organizations’ Disaster Response

Author: Cliff Landis, Christine Wiseman, Allyson F. Smith, and Matthew Stephens
Subjects: Bibliography. Library science. Information resources
Abstract: In June 2019, the Atlanta University Center Robert W. Woodruff Library received a LYRASIS Catalyst Fund grant to support the creation of a publicly editable directory of Georgia’s Natural, Cultural and Historical Organizations (NCHs), allowing for quick retrieval of location and contact information for disaster response. By the end of the project, over 1,900 entries for NCH organizations in Georgia were compiled, updated, and uploaded to Wikidata, the linked open data database from the Wikimedia Foundation. These entries included directory contact information and GIS coordinates that appear on a map presented on the GaNCH project website (https://ganch.auctr.edu/), allowing emergency responders to quickly search for NCHs by region and county in the event of a disaster. In this article we discuss the design principles, methods, and challenges encountered in building and implementing this tool, including the impact the tool has had on statewide disaster response after implementation.
Published: 2021

12. Dynamic effects of genetic variation on gene expression revealed following hypoxic stress in cardiomyocytes

Author: Michelle C Ward, Nicholas E Banovich, Abhishek Sarkar, Matthew Stephens, and Yoav Gilad
Subjects: genetic variation, hypoxia, cardiomyocytes, eQTL, gene regulation, stress response, Medicine, Science, Biology (General), QH301-705.5
Abstract: One life-threatening outcome of cardiovascular disease is myocardial infarction, where cardiomyocytes are deprived of oxygen. To study inter-individual differences in response to hypoxia, we established an in vitro model of induced pluripotent stem cell-derived cardiomyocytes from 15 individuals. We measured gene expression levels, chromatin accessibility, and methylation levels in four culturing conditions that correspond to normoxia, hypoxia, and short- or long-term re-oxygenation. We characterized thousands of gene regulatory changes as the cells transition between conditions. Using available genotypes, we identified 1,573 genes with a cis expression quantitative locus (eQTL) in at least one condition, as well as 367 dynamic eQTLs, which are classified as eQTLs in at least one, but not in all conditions. A subset of genes with dynamic eQTLs is associated with complex traits and disease. Our data demonstrate how dynamic genetic effects on gene expression, which are likely relevant for disease, can be uncovered under stress.
Published: 2021
Full Text: View/download PDF

13. Large-scale genome-wide enrichment analyses identify new trait-associated genes and pathways across 31 human phenotypes

Author: Xiang Zhu and Matthew Stephens
Subjects: Science
Abstract: In genome-wide association studies, variant-level associations are hard to identify and can be difficult to interpret biologically. Here, the authors develop a new model-based enrichment analysis method, and apply it to identify new associated genes, pathways and tissues across 31 human phenotypes.
Published: 2018
Full Text: View/download PDF

14. Off-Target Effect of Lovastatin Disrupts Dietary Lipid Uptake and Dissemination through Pro-Drug Inhibition of the Mesenteric Lymphatic Smooth Muscle Cell Contractile Apparatus

Author: Matthew Stephens, Simon Roizes, and Pierre-Yves von der Weid
Subjects: statins, lymphatics, smooth muscle, RhoKinase, cholesterol, Biology (General), QH301-705.5, Chemistry, QD1-999
Abstract: Previously published, off-target effects of statins on skeletal smooth muscle function have linked structural characteristics within this drug class to myopathic effects. However, the effect of these drugs on lymphatic vascular smooth muscle cell function, and by proxy dietary cholesterol uptake, by the intestinal lymphatic network has not been investigated. Several of the most widely prescribed statins (Atorvastatin, Pravastatin, Lovastatin, and Simvastatin) were tested for their in-situ effects on smooth muscle contractility in rat mesenteric collecting lymphatic vessels. Lovastatin and Simvastatin had a concentration-dependent effect of initially increasing vessel contraction frequency before flatlining the vessel, a phenomenon which was found to be a lactone-ring dependent phenomenon and could be ameliorated through use of Lovastatin- or Simvastatin-hydroxyacid (HA). Simvastatin treatment further resulted in mitochondrial depolymerization within primary-isolated rat lymphatic smooth muscle cells (LMCs) while Lovastatin was found to be acting in a mitochondrial-independent manner, increasing the function of RhoKinase. Lovastatin’s effect on RhoKinase was investigated through pharmacological testing and in vitro analysis of increased MLC and MYPT1 phosphorylation within primary isolated LMCs. Finally, acute in vivo treatment of rats with Lovastatin, but not Lovastatin-HA, resulted in a significantly decreased dietary lipid absorption in vivo through induced disfunction of mesenteric lymph uptake and trafficking.
Published: 2021
Full Text: View/download PDF

15. Bayesian multivariate reanalysis of large genetic studies identifies many new associations.

Author: Michael C Turchin and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Genome-wide association studies (GWAS) have now been conducted for hundreds of phenotypes of relevance to human health. Many such GWAS involve multiple closely-related phenotypes collected on the same samples. However, the vast majority of these GWAS have been analyzed using simple univariate analyses, which consider one phenotype at a time. This is despite the fact that, at least in simulation experiments, multivariate analyses have been shown to be more powerful at detecting associations. Here, we conduct multivariate association analyses on 13 different publicly-available GWAS datasets that involve multiple closely-related phenotypes. These data include large studies of anthropometric traits (GIANT), plasma lipid traits (GlobalLipids), and red blood cell traits (HaemgenRBC). Our analyses identify many new associations (433 in total across the 13 studies), many of which replicate when follow-up samples are available. Overall, our results demonstrate that multivariate analyses can help make more effective use of data from both existing and future GWAS.
Published: 2019
Full Text: View/download PDF

16. Creating and sharing reproducible research code the workflowr way [version 1; peer review: 3 approved]

Author: John D. Blischak, Peter Carbonetto, and Matthew Stephens
Subjects: Medicine, Science
Abstract: Making scientific analyses reproducible, well documented, and easily shareable is crucial to maximizing their impact and ensuring that others can build on them. However, accomplishing these goals is not easy, requiring careful attention to organization, workflow, and familiarity with tools that are not a regular part of every scientist's toolbox. We have developed an R package, workflowr, to help all scientists, regardless of background, overcome these challenges. Workflowr aims to instill a particular "workflow" — a sequence of steps to be repeated and integrated into research practice — that helps make projects more reproducible and accessible.This workflow integrates four key elements: (1) version control (via Git); (2) literate programming (via R Markdown); (3) automatic checks and safeguards that improve code reproducibility; and (4) sharing code and results via a browsable website. These features exploit powerful existing tools, whose mastery would take considerable study. However, the workflowr interface is simple enough that novice users can quickly enjoy its many benefits. By simply following the workflowr "workflow", R users can create projects whose results, figures, and development history are easily accessible on a static website — thereby conveniently shareable with collaborators by sending them a URL — and accompanied by source code and reproducibility safeguards. The workflowr R package is open source and available on CRAN, with full documentation and source code available at https://github.com/jdblischak/workflowr.
Published: 2019
Full Text: View/download PDF

17. Discovery and characterization of variance QTLs in human induced pluripotent stem cells.

Author: Abhishek K Sarkar, Po-Yuan Tung, John D Blischak, Jonathan E Burnett, Yang I Li, Matthew Stephens, and Yoav Gilad
Subjects: Genetics, QH426-470
Abstract: Quantification of gene expression levels at the single cell level has revealed that gene expression can vary substantially even across a population of homogeneous cells. However, it is currently unclear what genomic features control variation in gene expression levels, and whether common genetic variants may impact gene expression variation. Here, we take a genome-wide approach to identify expression variance quantitative trait loci (vQTLs). To this end, we generated single cell RNA-seq (scRNA-seq) data from induced pluripotent stem cells (iPSCs) derived from 53 Yoruba individuals. We collected data for a median of 95 cells per individual and a total of 5,447 single cells, and identified 235 mean expression QTLs (eQTLs) at 10% FDR, of which 79% replicate in bulk RNA-seq data from the same individuals. We further identified 5 vQTLs at 10% FDR, but demonstrate that these can also be explained as effects on mean expression. Our study suggests that dispersion QTLs (dQTLs) which could alter the variance of expression independently of the mean can have larger fold changes, but explain less phenotypic variance than eQTLs. We estimate 4,015 individuals as a lower bound to achieve 80% power to detect the strongest dQTLs in iPSCs. These results will guide the design of future studies on understanding the genetic control of gene expression variance.
Published: 2019
Full Text: View/download PDF

18. Mesenteric Lymphatic Alterations Observed During DSS Induced Intestinal Inflammation Are Driven in a TLR4-PAMP/DAMP Discriminative Manner

Author: Matthew Stephens, Shan Liao, and Pierre-Yves von der Weid
Subjects: lymphatics, mesentery, toll-like receptors, inflammation, lipopolysaccharides, colitis, Immunologic diseases. Allergy, RC581-607
Abstract: Background: Inflammatory bowel disease (IBD) is characterized by both acute and chronic phase inflammation of the gastro-intestinal (GI) tract that affect a large and growing number of people worldwide with little to no effective treatments. This is in part due to the lack of understanding of the disease pathogenesis and also the currently poorly described involvement of other systems such as the lymphatics. During DSS induced colitis, mice also develop a severe inflammation of terminal ileum with many features similar to IBD. As well as inflammation within the ileum we have previously demonstrated lymphatic remodeling within the mesentery and mesenteric lymph nodes of DSS-treated mice. The lymphatic remodeling includes lymphangiogenesis, lymphatic vessel dilation and leakiness, as well as cellular infiltration into the surrounding tissue and peripheral draining lymph nodes.Methods: Intestinal inflammation was induced in C57BL/6 mice by administration of 2.5% DSS in drinking water for 7 days. Mice were treated with TLR4 blocker C34 or Polymyxin-B (PMXB) daily from days 3 to 7 of DSS treatment via I.P. injection, and their therapeutic effects on disease activity and lymphatic function were examined. TLR activity and subsequent effect on lymphangiogenesis, lymphadenopathy, and mesenteric lymph node cellular composition were assessed.Results: DSS Mice treated with TLR4 inhibitor, C34, had a significantly improved disease phenotype characterized by reduced ileal and colonic insult. The change correlated with significant reduction in colonic and mesenteric inflammation, resolved mesenteric lymphangiectasia, and CD103+ DC migration similar to that of healthy control. PMXB treatment however did not resolve inflammation within the colon or associated mesenteric lymphatic dysfunction but did however prevent lymphadenopathy within the MLN through alteration of CCL21 gradients and CD103+ DC migration.Conclusions: TLR4 appears to mediate several changes within the mesenteric lymphatics, more specifically it is shown to have different outcomes whether stimulation occurs through pathogen derived factors such as LPS or tissue derived DAMPs, a novel phenomenon.
Published: 2019
Full Text: View/download PDF

19. Estimating recent migration and population-size surfaces.

Author: Hussein Al-Asadi, Desislava Petkova, Matthew Stephens, and John Novembre
Subjects: Genetics, QH426-470
Abstract: In many species a fundamental feature of genetic diversity is that genetic similarity decays with geographic distance; however, this relationship is often complex, and may vary across space and time. Methods to uncover and visualize such relationships have widespread use for analyses in molecular ecology, conservation genetics, evolutionary genetics, and human genetics. While several frameworks exist, a promising approach is to infer maps of how migration rates vary across geographic space. Such maps could, in principle, be estimated across time to reveal the full complexity of population histories. Here, we take a step in this direction: we present a method to infer maps of population sizes and migration rates associated with different time periods from a matrix of genetic similarity between every pair of individuals. Specifically, genetic similarity is measured by counting the number of long segments of haplotype sharing (also known as identity-by-descent tracts). By varying the length of these segments we obtain parameter estimates associated with different time periods. Using simulations, we show that the method can reveal time-varying migration rates and population sizes, including changes that are not detectable when using a similar method that ignores haplotypic structure. We apply the method to a dataset of contemporary European individuals (POPRES), and provide an integrated analysis of recent population structure and growth over the last ∼3,000 years in Europe.
Published: 2019
Full Text: View/download PDF

20. Correction: A Unified Framework for Association Analysis with Multiple Related Phenotypes.

Author: Matthew Stephens
Subjects: Medicine, Science
Abstract: [This corrects the article DOI: 10.1371/journal.pone.0065245.].
Published: 2019
Full Text: View/download PDF

21. Silencing of transposable elements may not be a major driver of regulatory evolution in primate iPSCs

Author: Michelle C Ward, Siming Zhao, Kaixuan Luo, Bryan J Pavlovic, Mohammad M Karimi, Matthew Stephens, and Yoav Gilad
Subjects: Chimpanzee, Transposable elements, Gene regulation, evolution, Medicine, Science, Biology (General), QH301-705.5
Abstract: Transposable elements (TEs) comprise almost half of primate genomes and their aberrant regulation can result in deleterious effects. In pluripotent stem cells, rapidly evolving KRAB-ZNF genes target TEs for silencing by H3K9me3. To investigate the evolution of TE silencing, we performed H3K9me3 ChIP-seq experiments in induced pluripotent stem cells from 10 human and 7 chimpanzee individuals. We identified four million orthologous TEs and found the SVA and ERV families to be marked most frequently by H3K9me3. We found little evidence of inter-species differences in TE silencing, with as many as 82% of putatively silenced TEs marked at similar levels in humans and chimpanzees. TEs that are preferentially silenced in one species are a similar age to those silenced in both species and are not more likely to be associated with expression divergence of nearby orthologous genes. Our data suggest limited species-specificity of TE silencing across 6 million years of primate evolution.
Published: 2018
Full Text: View/download PDF

22. Correction: Visualizing the structure of RNA-seq expression data using grade of membership models.

Author: Kushal K Dey, Chiaowen Joyce Hsiao, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: [This corrects the article DOI: 10.1371/journal.pgen.1006599.].
Published: 2017
Full Text: View/download PDF

23. Visualizing the structure of RNA-seq expression data using grade of membership models.

Author: Kushal K Dey, Chiaowen Joyce Hsiao, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Grade of membership models, also known as "admixture models", "topic models" or "Latent Dirichlet Allocation", are a generalization of cluster models that allow each sample to have membership in multiple clusters. These models are widely used in population genetics to model admixed individuals who have ancestry from multiple "populations", and in natural language processing to model documents having words from multiple "topics". Here we illustrate the potential for these models to cluster samples of RNA-seq gene expression data, measured on either bulk samples or single cells. We also provide methods to help interpret the clusters, by identifying genes that are distinctively expressed in each cluster. By applying these methods to several example RNA-seq applications we demonstrate their utility in identifying and summarizing structure and heterogeneity. Applied to data from the GTEx project on 53 human tissues, the approach highlights similarities among biologically-related tissues and identifies distinctively-expressed genes that recapitulate known biology. Applied to single-cell expression data from mouse preimplantation embryos, the approach highlights both discrete and continuous variation through early embryonic development stages, and highlights genes involved in a variety of relevant processes-from germ cell development, through compaction and morula formation, to the formation of inner cell mass and trophoblast at the blastocyst stage. The methods are implemented in the Bioconductor package CountClust.
Published: 2017
Full Text: View/download PDF

24. Thousands of novel translated open reading frames in humans inferred by ribosome footprint profiling

Author: Anil Raj, Sidney H Wang, Heejung Shim, Arbel Harpak, Yang I Li, Brett Engelmann, Matthew Stephens, Yoav Gilad, and Jonathan K Pritchard
Subjects: ribosome profiling, translation, hidden Markov models, upstream ORF, noncoding RNA, Medicine, Science, Biology (General), QH301-705.5
Abstract: Accurate annotation of protein coding regions is essential for understanding how genetic information is translated into function. We describe riboHMM, a new method that uses ribosome footprint data to accurately infer translated sequences. Applying riboHMM to human lymphoblastoid cell lines, we identified 7273 novel coding sequences, including 2442 translated upstream open reading frames. We observed an enrichment of footprints at inferred initiation sites after drug-induced arrest of translation initiation, validating many of the novel coding sequences. The novel proteins exhibit significant selective constraint in the inferred reading frames, suggesting that many are functional. Moreover, ~40% of bicistronic transcripts showed negative correlation in the translation levels of their two coding sequences, suggesting a potential regulatory role for these novel regions. Despite known limitations of mass spectrometry to detect protein expressed at low level, we estimated a 14% validation rate. Our work significantly expands the set of known coding regions in humans.
Published: 2016
Full Text: View/download PDF

25. A Simple Model-Based Approach to Inferring and Visualizing Cancer Mutation Signatures.

Author: Yuichi Shiraishi, Georg Tremmel, Satoru Miyano, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Recent advances in sequencing technologies have enabled the production of massive amounts of data on somatic mutations from cancer genomes. These data have led to the detection of characteristic patterns of somatic mutations or "mutation signatures" at an unprecedented resolution, with the potential for new insights into the causes and mechanisms of tumorigenesis. Here we present new methods for modelling, identifying and visualizing such mutation signatures. Our methods greatly simplify mutation signature models compared with existing approaches, reducing the number of parameters by orders of magnitude even while increasing the contextual factors (e.g. the number of flanking bases) that are accounted for. This improves both sensitivity and robustness of inferred signatures. We also provide a new intuitive way to visualize the signatures, analogous to the use of sequence logos to visualize transcription factor binding sites. We illustrate our new method on somatic mutation data from urothelial carcinoma of the upper urinary tract, and a larger dataset from 30 diverse cancer types. The results illustrate several important features of our methods, including the ability of our new visualization tool to clearly highlight the key features of each signature, the improved robustness of signature inferences from small sample sizes, and more detailed inference of signature characteristics such as strand biases and sequence context effects at the base two positions 5' to the mutated site. The overall framework of our work is based on probabilistic models that are closely connected with "mixed-membership models" which are widely used in population genetic admixture analysis, and in machine learning for document clustering. We argue that recognizing these relationships should help improve understanding of mutation signature extraction problems, and suggests ways to further improve the statistical methods. Our methods are implemented in an R package pmsignature (https://github.com/friend1ws/pmsignature) and a web application available at https://friend1ws.shinyapps.io/pmsignature_shiny/.
Published: 2015
Full Text: View/download PDF

26. The genetic architecture of gene expression levels in wild baboons

Author: Jenny Tung, Xiang Zhou, Susan C Alberts, Matthew Stephens, and Yoav Gilad
Subjects: baboon, allele-specific expression, RNA-seq, expression quantitative trait locus, Medicine, Science, Biology (General), QH301-705.5
Abstract: Primate evolution has been argued to result, in part, from changes in how genes are regulated. However, we still know little about gene regulation in natural primate populations. We conducted an RNA sequencing (RNA-seq)-based study of baboons from an intensively studied wild population. We performed complementary expression quantitative trait locus (eQTL) mapping and allele-specific expression analyses, discovering substantial evidence for, and surprising power to detect, genetic effects on gene expression levels in the baboons. eQTL were most likely to be identified for lineage-specific, rapidly evolving genes; interestingly, genes with eQTL significantly overlapped between baboons and a comparable human eQTL data set. Our results suggest that genes vary in their tolerance of genetic perturbation, and that this property may be conserved across species. Further, they establish the feasibility of eQTL mapping using RNA-seq data alone, and represent an important step towards understanding the genetic architecture of gene expression in primates.
Published: 2015
Full Text: View/download PDF

27. A multivariate genome-wide association analysis of 10 LDL subfractions, and their response to statin treatment, in 1868 Caucasians.

Author: Heejung Shim, Daniel I Chasman, Joshua D Smith, Samia Mora, Paul M Ridker, Deborah A Nickerson, Ronald M Krauss, and Matthew Stephens
Subjects: Medicine, Science
Abstract: We conducted a genome-wide association analysis of 7 subfractions of low density lipoproteins (LDLs) and 3 subfractions of intermediate density lipoproteins (IDLs) measured by gradient gel electrophoresis, and their response to statin treatment, in 1868 individuals of European ancestry from the Pharmacogenomics and Risk of Cardiovascular Disease study. Our analyses identified four previously-implicated loci (SORT1, APOE, LPA, and CETP) as containing variants that are very strongly associated with lipoprotein subfractions (log(10)Bayes Factor > 15). Subsequent conditional analyses suggest that three of these (APOE, LPA and CETP) likely harbor multiple independently associated SNPs. Further, while different variants typically showed different characteristic patterns of association with combinations of subfractions, the two SNPs in CETP show strikingly similar patterns--both in our original data and in a replication cohort--consistent with a common underlying molecular mechanism. Notably, the CETP variants are very strongly associated with LDL subfractions, despite showing no association with total LDLs in our study, illustrating the potential value of the more detailed phenotypic measurements. In contrast with these strong subfraction associations, genetic association analysis of subfraction response to statins showed much weaker signals (none exceeding log(10)Bayes Factor of 6). However, two SNPs (in APOE and LPA) previously-reported to be associated with LDL statin response do show some modest evidence for association in our data, and the subfraction response proles at the LPA SNP are consistent with the LPA association, with response likely being due primarily to resistance of Lp(a) particles to statin therapy. An additional important feature of our analysis is that, unlike most previous analyses of multiple related phenotypes, we analyzed the subfractions jointly, rather than one at a time. Comparisons of our multivariate analyses with standard univariate analyses demonstrate that multivariate analyses can substantially increase power to detect associations. Software implementing our multivariate analysis methods is available at http://stephenslab.uchicago.edu/software.html.
Published: 2015
Full Text: View/download PDF

28. msCentipede: Modeling Heterogeneity across Genomic Sites and Replicates Improves Accuracy in the Inference of Transcription Factor Binding.

Author: Anil Raj, Heejung Shim, Yoav Gilad, Jonathan K Pritchard, and Matthew Stephens
Subjects: Medicine, Science
Abstract: Understanding global gene regulation depends critically on accurate annotation of regulatory elements that are functional in a given cell type. CENTIPEDE, a powerful, probabilistic framework for identifying transcription factor binding sites from tissue-specific DNase I cleavage patterns and genomic sequence content, leverages the hypersensitivity of factor-bound chromatin and the information in the DNase I spatial cleavage profile characteristic of each DNA binding protein to accurately infer functional factor binding sites. However, the model for the spatial profile in this framework fails to account for the substantial variation in the DNase I cleavage profiles across different binding sites. Neither does it account for variation in the profiles at the same binding site across multiple replicate DNase I experiments, which are increasingly available. In this work, we introduce new methods, based on multi-scale models for inhomogeneous Poisson processes, to account for such variation in DNase I cleavage patterns both within and across binding sites. These models account for the spatial structure in the heterogeneity in DNase I cleavage patterns for each factor. Using DNase-seq measurements assayed in a lymphoblastoid cell line, we demonstrate the improved performance of this model for several transcription factors by comparing against the Chip-seq peaks for those factors. Finally, we explore the effects of DNase I sequence bias on inference of factor binding using a simple extension to our framework that allows for a more flexible background model. The proposed model can also be easily applied to paired-end ATAC-seq and DNase-seq data. msCentipede, a Python implementation of our algorithm, is available at http://rajanil.github.io/msCentipede.
Published: 2015
Full Text: View/download PDF

29. A flexible empirical Bayes approach to multiple linear regression and connections with penalized regression.

Author: Youngseok Kim, Wei Wang, Peter Carbonetto, and Matthew Stephens
Published: 2024

30. A statistical framework for joint eQTL analysis in multiple tissues.

Author: Timothée Flutre, Xiaoquan Wen, Jonathan Pritchard, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Mapping expression Quantitative Trait Loci (eQTLs) represents a powerful and widely adopted approach to identifying putative regulatory variants and linking them to specific genes. Up to now eQTL studies have been conducted in a relatively narrow range of tissues or cell types. However, understanding the biology of organismal phenotypes will involve understanding regulation in multiple tissues, and ongoing studies are collecting eQTL data in dozens of cell types. Here we present a statistical framework for powerfully detecting eQTLs in multiple tissues or cell types (or, more generally, multiple subgroups). The framework explicitly models the potential for each eQTL to be active in some tissues and inactive in others. By modeling the sharing of active eQTLs among tissues, this framework increases power to detect eQTLs that are present in more than one tissue compared with "tissue-by-tissue" analyses that examine each tissue separately. Conversely, by modeling the inactivity of eQTLs in some tissues, the framework allows the proportion of eQTLs shared across different tissues to be formally estimated as parameters of a model, addressing the difficulties of accounting for incomplete power when comparing overlaps of eQTLs identified by tissue-by-tissue analyses. Applying our framework to re-analyze data from transformed B cells, T cells, and fibroblasts, we find that it substantially increases power compared with tissue-by-tissue analysis, identifying 63% more genes with eQTLs (at FDR = 0.05). Further, the results suggest that, in contrast to previous analyses of the same data, the majority of eQTLs detectable in these data are shared among all three tissues.
Published: 2013
Full Text: View/download PDF

31. Polygenic modeling with bayesian sparse linear mixed models.

Author: Xiang Zhou, Peter Carbonetto, and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Both linear mixed models (LMMs) and sparse regression models are widely used in genetics applications, including, recently, polygenic modeling in genome-wide association studies. These two approaches make very different assumptions, so are expected to perform well in different situations. However, in practice, for a given dataset one typically does not know which assumptions will be more accurate. Motivated by this, we consider a hybrid of the two, which we refer to as a "Bayesian sparse linear mixed model" (BSLMM) that includes both these models as special cases. We address several key computational and statistical issues that arise when applying BSLMM, including appropriate prior specification for the hyper-parameters and a novel Markov chain Monte Carlo algorithm for posterior inference. We apply BSLMM and compare it with other methods for two polygenic modeling applications: estimating the proportion of variance in phenotypes explained (PVE) by available genotypes, and phenotype (or breeding value) prediction. For PVE estimation, we demonstrate that BSLMM combines the advantages of both standard LMMs and sparse regression modeling. For phenotype prediction it considerably outperforms either of the other two methods, as well as several other large-scale regression methods previously suggested for this problem. Software implementing our method is freely available from http://stephenslab.uchicago.edu/software.html.
Published: 2013
Full Text: View/download PDF

32. A unified framework for association analysis with multiple related phenotypes.

Author: Matthew Stephens
Subjects: Medicine, Science
Abstract: We consider the problem of assessing associations between multiple related outcome variables, and a single explanatory variable of interest. This problem arises in many settings, including genetic association studies, where the explanatory variable is genotype at a genetic variant. We outline a framework for conducting this type of analysis, based on Bayesian model comparison and model averaging for multivariate regressions. This framework unifies several common approaches to this problem, and includes both standard univariate and standard multivariate association tests as special cases. The framework also unifies the problems of testing for associations and explaining associations - that is, identifying which outcome variables are associated with genotype. This provides an alternative to the usual, but conceptually unsatisfying, approach of resorting to univariate tests when explaining and interpreting significant multivariate findings. The method is computationally tractable genome-wide for modest numbers of phenotypes (e.g. 5-10), and can be applied to summary data, without access to raw genotype and phenotype data. We illustrate the methods on both simulated examples, and to a genome-wide association study of blood lipid traits where we identify 18 potential novel genetic associations that were not identified by univariate analyses of the same data.
Published: 2013
Full Text: View/download PDF

33. Integrated enrichment analysis of variants and pathways in genome-wide association studies indicates central role for IL-2 signaling genes in type 1 diabetes, and cytokine signaling genes in Crohn's disease.

Author: Peter Carbonetto and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Pathway analyses of genome-wide association studies aggregate information over sets of related genes, such as genes in common pathways, to identify gene sets that are enriched for variants associated with disease. We develop a model-based approach to pathway analysis, and apply this approach to data from the Wellcome Trust Case Control Consortium (WTCCC) studies. Our method offers several benefits over existing approaches. First, our method not only interrogates pathways for enrichment of disease associations, but also estimates the level of enrichment, which yields a coherent way to promote variants in enriched pathways, enhancing discovery of genes underlying disease. Second, our approach allows for multiple enriched pathways, a feature that leads to novel findings in two diseases where the major histocompatibility complex (MHC) is a major determinant of disease susceptibility. Third, by modeling disease as the combined effect of multiple markers, our method automatically accounts for linkage disequilibrium among variants. Interrogation of pathways from eight pathway databases yields strong support for enriched pathways, indicating links between Crohn's disease (CD) and cytokine-driven networks that modulate immune responses; between rheumatoid arthritis (RA) and "Measles" pathway genes involved in immune responses triggered by measles infection; and between type 1 diabetes (T1D) and IL2-mediated signaling genes. Prioritizing variants in these enriched pathways yields many additional putative disease associations compared to analyses without enrichment. For CD and RA, 7 of 8 additional non-MHC associations are corroborated by other studies, providing validation for our approach. For T1D, prioritization of IL-2 signaling genes yields strong evidence for 7 additional non-MHC candidate disease loci, as well as suggestive evidence for several more. Of the 7 strongest associations, 4 are validated by other studies, and 3 (near IL-2 signaling genes RAF1, MAPK14, and FYN) constitute novel putative T1D loci for further study.
Published: 2013
Full Text: View/download PDF

34. Genetic, functional and molecular features of glucocorticoid receptor binding.

Author: Francesca Luca, Joseph C Maranville, Allison L Richards, David B Witonsky, Matthew Stephens, and Anna Di Rienzo
Subjects: Medicine, Science
Abstract: Glucocorticoids (GCs) are key mediators of stress response and are widely used as pharmacological agents to treat immune diseases, such as asthma and inflammatory bowel disease, and certain types of cancer. GCs act mainly by activating the GC receptor (GR), which interacts with other transcription factors to regulate gene expression. Here, we combined different functional genomics approaches to gain molecular insights into the mechanisms of action of GC. By profiling the transcriptional response to GC over time in 4 Yoruba (YRI) and 4 Tuscans (TSI) lymphoblastoid cell lines (LCLs), we suggest that the transcriptional response to GC is variable not only in time, but also in direction (positive or negative) depending on the presence of specific interacting transcription factors. Accordingly, when we performed ChIP-seq for GR and NF-κB in two YRI LCLs treated with GC or with vehicle control, we observed that features of GR binding sites differ for up- and down-regulated genes. Finally, we show that eQTLs that affect expression patterns only in the presence of GC are 1.9-fold more likely to occur in GR binding sites, compared to eQTLs that affect expression only in its absence. Our results indicate that genetic variation at GR and interacting transcription factors binding sites influences variability in gene expression, and attest to the power of combining different functional genomic approaches.
Published: 2013
Full Text: View/download PDF

35. Exon-specific QTLs skew the inferred distribution of expression QTLs detected using gene expression array data.

Author: Jean-Baptiste Veyrieras, Daniel J Gaffney, Joseph K Pickrell, Yoav Gilad, Matthew Stephens, and Jonathan K Pritchard
Subjects: Medicine, Science
Abstract: Mapping of expression quantitative trait loci (eQTLs) is an important technique for studying how genetic variation affects gene regulation in natural populations. In a previous study using Illumina expression data from human lymphoblastoid cell lines, we reported that cis-eQTLs are especially enriched around transcription start sites (TSSs) and immediately upstream of transcription end sites (TESs). In this paper, we revisit the distribution of eQTLs using additional data from Affymetrix exon arrays and from RNA sequencing. We confirm that most eQTLs lie close to the target genes; that transcribed regions are generally enriched for eQTLs; that eQTLs are more abundant in exons than introns; and that the peak density of eQTLs occurs at the TSS. However, we find that the intriguing TES peak is greatly reduced or absent in the Affymetrix and RNA-seq data. Instead our data suggest that the TES peak observed in the Illumina data is mainly due to exon-specific QTLs that affect 3' untranslated regions, where most of the Illumina probes are positioned. Nonetheless, we do observe an overall enrichment of eQTLs in exons versus introns in all three data sets, consistent with an important role for exonic sequences in gene regulation.
Published: 2012
Full Text: View/download PDF

36. The contribution of RNA decay quantitative trait loci to inter-individual variation in steady-state gene expression levels.

Author: Athma A Pai, Carolyn E Cain, Orna Mizrahi-Man, Sherryl De Leon, Noah Lewellen, Jean-Baptiste Veyrieras, Jacob F Degner, Daniel J Gaffney, Joseph K Pickrell, Matthew Stephens, Jonathan K Pritchard, and Yoav Gilad
Subjects: Genetics, QH426-470
Abstract: Recent gene expression QTL (eQTL) mapping studies have provided considerable insight into the genetic basis for inter-individual regulatory variation. However, a limitation of all eQTL studies to date, which have used measurements of steady-state gene expression levels, is the inability to directly distinguish between variation in transcription and decay rates. To address this gap, we performed a genome-wide study of variation in gene-specific mRNA decay rates across individuals. Using a time-course study design, we estimated mRNA decay rates for over 16,000 genes in 70 Yoruban HapMap lymphoblastoid cell lines (LCLs), for which extensive genotyping data are available. Considering mRNA decay rates across genes, we found that: (i) as expected, highly expressed genes are generally associated with lower mRNA decay rates, (ii) genes with rapid mRNA decay rates are enriched with putative binding sites for miRNA and RNA binding proteins, and (iii) genes with similar functional roles tend to exhibit correlated rates of mRNA decay. Focusing on variation in mRNA decay across individuals, we estimate that steady-state expression levels are significantly correlated with variation in decay rates in 10% of genes. Somewhat counter-intuitively, for about half of these genes, higher expression is associated with faster decay rates, possibly due to a coupling of mRNA decay with transcriptional processes in genes involved in rapid cellular responses. Finally, we used these data to map genetic variation that is specifically associated with variation in mRNA decay rates across individuals. We found 195 such loci, which we named RNA decay quantitative trait loci ("rdQTLs"). All the observed rdQTLs are located near the regulated genes and therefore are assumed to act in cis. By analyzing our data within the context of known steady-state eQTLs, we estimate that a substantial fraction of eQTLs are associated with inter-individual variation in mRNA decay rates.
Published: 2012
Full Text: View/download PDF

37. Genome-wide association study of d-amphetamine response in healthy volunteers identifies putative associations, including cadherin 13 (CDH13).

Author: Amy B Hart, Barbara E Engelhardt, Margaret C Wardle, Greta Sokoloff, Matthew Stephens, Harriet de Wit, and Abraham A Palmer
Subjects: Medicine, Science
Abstract: Both the subjective response to d-amphetamine and the risk for amphetamine addiction are known to be heritable traits. Because subjective responses to drugs may predict drug addiction, identifying alleles that influence acute response may also provide insight into the genetic risk factors for drug abuse. We performed a Genome Wide Association Study (GWAS) for the subjective responses to amphetamine in 381 non-drug abusing healthy volunteers. Responses to amphetamine were measured using a double-blind, placebo-controlled, within-subjects design. We used sparse factor analysis to reduce the dimensionality of the data to ten factors. We identified several putative associations; the strongest was between a positive subjective drug-response factor and a SNP (rs3784943) in the 8(th) intron of cadherin 13 (CDH13; P = 4.58×10(-8)), a gene previously associated with a number of psychiatric traits including methamphetamine dependence. Additionally, we observed a putative association between a factor representing the degree of positive affect at baseline and a SNP (rs472402) in the 1(st) intron of steroid-5-alpha-reductase-α-polypeptide-1 (SRD5A1; P = 2.53×10(-7)), a gene whose protein product catalyzes the rate-limiting step in synthesis of the neurosteroid allopregnanolone. This SNP belongs to an LD-block that has been previously associated with the expression of SRD5A1 and differences in SRD5A1 enzymatic activity. The purpose of this study was to begin to explore the genetic basis of subjective responses to stimulant drugs using a GWAS approach in a modestly sized sample. Our approach provides a case study for analysis of high-dimensional intermediate pharmacogenomic phenotypes, which may be more tractable than clinical diagnoses.
Published: 2012
Full Text: View/download PDF

38. Statistical inference of in vivo properties of human DNA methyltransferases from double-stranded methylation patterns.

Author: Audrey Q Fu, Diane P Genereux, Reinhard Stöger, Alice F Burden, Charles D Laird, and Matthew Stephens
Subjects: Medicine, Science
Abstract: DNA methyltransferases establish methylation patterns in cells and transmit these patterns over cell generations, thereby influencing each cell's epigenetic states. Three primary DNA methyltransferases have been identified in mammals: DNMT1, DNMT3A and DNMT3B. Extensive in vitro studies have investigated key properties of these enzymes, namely their substrate specificity and processivity. Here we study these properties in vivo, by applying novel statistical analysis methods to double-stranded DNA methylation patterns collected using hairpin-bisulfite PCR. Our analysis fits a novel Hidden Markov Model (HMM) to the observed data, allowing for potential bisulfite conversion errors, and yields statistical estimates of parameters that quantify enzyme processivity and substrate specificity. We apply this model to methylation patterns established in vivo at three loci in humans: two densely methylated inactive X (Xi)-linked loci (FMR1 and G6PD), and an autosomal locus (LEP), where methylation densities are tissue-specific but moderate. We find strong evidence for a high level of processivity of DNMT1 at FMR1 and G6PD, with the mean association tract length being a few hundred base pairs. Regardless of tissue types, methylation patterns at LEP are dominated by DNMT1 maintenance events, similar to the two Xi-linked loci, but are insufficiently informative regarding processivity to draw any conclusions about processivity at that locus. At all three loci we find that DNMT1 shows a strong preference for adding methyl groups to hemi-methylated CpG sites over unmethylated sites. The data at all three loci also suggest low (possibly 0) association of the de novo methyltransferases, the DNMT3s, and are consequently uninformative about processivity or preference of these enzymes. We also extend our HMM to reanalyze published data on mouse DNMT1 activities in vitro. The results suggest shorter association tracts (and hence weaker processivity), and much longer non-association tracts than human DNMT1 in vivo.
Published: 2012
Full Text: View/download PDF

39. Interactions between glucocorticoid treatment and cis-regulatory polymorphisms contribute to cellular response phenotypes.

Author: Joseph C Maranville, Francesca Luca, Allison L Richards, Xiaoquan Wen, David B Witonsky, Shaneen Baxter, Matthew Stephens, and Anna Di Rienzo
Subjects: Genetics, QH426-470
Abstract: Glucocorticoids (GCs) mediate physiological responses to environmental stress and are commonly used as pharmaceuticals. GCs act primarily through the GC receptor (GR, a transcription factor). Despite their clear biomedical importance, little is known about the genetic architecture of variation in GC response. Here we provide an initial assessment of variability in the cellular response to GC treatment by profiling gene expression and protein secretion in 114 EBV-transformed B lymphocytes of African and European ancestry. We found that genetic variation affects the response of nearby genes and exhibits distinctive patterns of genotype-treatment interactions, with genotypic effects evident in either only GC-treated or only control-treated conditions. Using a novel statistical framework, we identified interactions that influence the expression of 26 genes known to play central roles in GC-related pathways (e.g. NQO1, AIRE, and SGK1) and that influence the secretion of IL6.
Published: 2011
Full Text: View/download PDF

40. Variation in human recombination rates and its genetic determinants.

Author: Adi Fledel-Alon, Ellen Miranda Leffler, Yongtao Guan, Matthew Stephens, Graham Coop, and Molly Przeworski
Subjects: Medicine, Science
Abstract: Despite the fundamental role of crossing-over in the pairing and segregation of chromosomes during human meiosis, the rates and placements of events vary markedly among individuals. Characterizing this variation and identifying its determinants are essential steps in our understanding of the human recombination process and its evolution.Using three large sets of European-American pedigrees, we examined variation in five recombination phenotypes that capture distinct aspects of crossing-over patterns. We found that the mean recombination rate in males and females and the historical hotspot usage are significantly heritable and are uncorrelated with one another. We then conducted a genome-wide association study in order to identify loci that influence them. We replicated associations of RNF212 with the mean rate in males and in females as well as the association of Inversion 17q21.31 with the female mean rate. We also replicated the association of PRDM9 with historical hotspot usage, finding that it explains most of the genetic variance in this phenotype. In addition, we identified a set of new candidate regions for further validation.These findings suggest that variation at broad and fine scales is largely separable and that, beyond three known loci, there is no evidence for common variation with large effects on recombination phenotypes.
Published: 2011
Full Text: View/download PDF

41. Functional comparison of innate immune signaling pathways in primates.

Author: Luis B Barreiro, John C Marioni, Ran Blekhman, Matthew Stephens, and Yoav Gilad
Subjects: Genetics, QH426-470
Abstract: Humans respond differently than other primates to a large number of infections. Differences in susceptibility to infectious agents between humans and other primates are probably due to inter-species differences in immune response to infection. Consistent with that notion, genes involved in immunity-related processes are strongly enriched among recent targets of positive selection in primates, suggesting that immune responses evolve rapidly, yet providing only indirect evidence for possible inter-species functional differences. To directly compare immune responses among primates, we stimulated primary monocytes from humans, chimpanzees, and rhesus macaques with lipopolysaccharide (LPS) and studied the ensuing time-course regulatory responses. We find that, while the universal Toll-like receptor response is mostly conserved across primates, the regulatory response associated with viral infections is often lineage-specific, probably reflecting rapid host-virus mutual adaptation cycles. Additionally, human-specific immune responses are enriched for genes involved in apoptosis, as well as for genes associated with cancer and with susceptibility to infectious diseases or immune-related disorders. Finally, we find that chimpanzee-specific immune signaling pathways are enriched for HIV-interacting genes. Put together, our observations lend strong support to the notion that lineage-specific immune responses may help explain known inter-species differences in susceptibility to infectious diseases.
Published: 2010
Full Text: View/download PDF

42. Analysis of population structure: a unifying framework and novel methods based on sparse factor analysis.

Author: Barbara E Engelhardt and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: We consider the statistical analysis of population structure using genetic data. We show how the two most widely used approaches to modeling population structure, admixture-based models and principal components analysis (PCA), can be viewed within a single unifying framework of matrix factorization. Specifically, they can both be interpreted as approximating an observed genotype matrix by a product of two lower-rank matrices, but with different constraints or prior distributions on these lower-rank matrices. This opens the door to a large range of possible approaches to analyzing population structure, by considering other constraints or priors. In this paper, we introduce one such novel approach, based on sparse factor analysis (SFA). We investigate the effects of the different types of constraint in several real and simulated data sets. We find that SFA produces similar results to admixture-based models when the samples are descended from a few well-differentiated ancestral populations and can recapitulate the results of PCA when the population structure is more "continuous," as in isolation-by-distance models.
Published: 2010
Full Text: View/download PDF

43. Genome-wide association of lipid-lowering response to statins in combined study populations.

Author: Mathew J Barber, Lara M Mangravite, Craig L Hyde, Daniel I Chasman, Joshua D Smith, Catherine A McCarty, Xiaohui Li, Russell A Wilke, Mark J Rieder, Paul T Williams, Paul M Ridker, Aurobindo Chatterjee, Jerome I Rotter, Deborah A Nickerson, Matthew Stephens, and Ronald M Krauss
Subjects: Medicine, Science
Abstract: Statins effectively lower total and plasma LDL-cholesterol, but the magnitude of decrease varies among individuals. To identify single nucleotide polymorphisms (SNPs) contributing to this variation, we performed a combined analysis of genome-wide association (GWA) results from three trials of statin efficacy.Bayesian and standard frequentist association analyses were performed on untreated and statin-mediated changes in LDL-cholesterol, total cholesterol, HDL-cholesterol, and triglyceride on a total of 3932 subjects using data from three studies: Cholesterol and Pharmacogenetics (40 mg/day simvastatin, 6 weeks), Pravastatin/Inflammation CRP Evaluation (40 mg/day pravastatin, 24 weeks), and Treating to New Targets (10 mg/day atorvastatin, 8 weeks). Genotype imputation was used to maximize genomic coverage and to combine information across studies. Phenotypes were normalized within each study to account for systematic differences among studies, and fixed-effects combined analysis of the combined sample were performed to detect consistent effects across studies. Two SNP associations were assessed as having posterior probability greater than 50%, indicating that they were more likely than not to be genuinely associated with statin-mediated lipid response. SNP rs8014194, located within the CLMN gene on chromosome 14, was strongly associated with statin-mediated change in total cholesterol with an 84% probability by Bayesian analysis, and a p-value exceeding conventional levels of genome-wide significance by frequentist analysis (P = 1.8 x 10(-8)). This SNP was less significantly associated with change in LDL-cholesterol (posterior probability = 0.16, P = 4.0 x 10(-6)). Bayesian analysis also assigned a 51% probability that rs4420638, located in APOC1 and near APOE, was associated with change in LDL-cholesterol.Using combined GWA analysis from three clinical trials involving nearly 4,000 individuals treated with simvastatin, pravastatin, or atorvastatin, we have identified SNPs that may be associated with variation in the magnitude of statin-mediated reduction in total and LDL-cholesterol, including one in the CLMN gene for which statistical evidence for association exceeds conventional levels of genome-wide significance.PRINCE and TNT are not registered. CAP is registered at Clinicaltrials.gov NCT00451828.
Published: 2010
Full Text: View/download PDF

44. Practical issues in imputation-based association mapping.

Author: Yongtao Guan and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Imputation-based association methods provide a powerful framework for testing untyped variants for association with phenotypes and for combining results from multiple studies that use different genotyping platforms. Here, we consider several issues that arise when applying these methods in practice, including: (i) factors affecting imputation accuracy, including choice of reference panel; (ii) the effects of imputation accuracy on power to detect associations; (iii) the relative merits of Bayesian and frequentist approaches to testing imputed genotypes for association with phenotype; and (iv) how to quickly and accurately compute Bayes factors for testing imputed SNPs. We find that imputation-based methods can be robust to imputation accuracy and can improve power to detect associations, even when average imputation accuracy is poor. We explain how ranking SNPs for association by a standard likelihood ratio test gives the same results as a Bayesian procedure that uses an unnatural prior assumption--specifically, that difficult-to-impute SNPs tend to have larger effects--and assess the power gained from using a Bayesian approach that does not make this assumption. Within the Bayesian framework, we find that good approximations to a full analysis can be achieved by simply replacing unknown genotypes with a point estimate--their posterior mean. This approximation considerably reduces computational expense compared with published sampling-based approaches, and the methods we present are practical on a genome-wide scale with very modest computational resources (e.g., a single desktop computer). The approximation also facilitates combining information across studies, using only summary data for each SNP. Methods discussed here are implemented in the software package BIMBAM, which is available from http://stephenslab.uchicago.edu/software.html.
Published: 2008
Full Text: View/download PDF

45. High-resolution mapping of expression-QTLs yields insight into human gene regulation.

Author: Jean-Baptiste Veyrieras, Sridhar Kudaravalli, Su Yeon Kim, Emmanouil T Dermitzakis, Yoav Gilad, Matthew Stephens, and Jonathan K Pritchard
Subjects: Genetics, QH426-470
Abstract: Recent studies of the HapMap lymphoblastoid cell lines have identified large numbers of quantitative trait loci for gene expression (eQTLs). Reanalyzing these data using a novel Bayesian hierarchical model, we were able to create a surprisingly high-resolution map of the typical locations of sites that affect mRNA levels in cis. Strikingly, we found a strong enrichment of eQTLs in the 250 bp just upstream of the transcription end site (TES), in addition to an enrichment around the transcription start site (TSS). Most eQTLs lie either within genes or close to genes; for example, we estimate that only 5% of eQTLs lie more than 20 kb upstream of the TSS. After controlling for position effects, SNPs in exons are approximately 2-fold more likely than SNPs in introns to be eQTLs. Our results suggest an important role for mRNA stability in determining steady-state mRNA levels, and highlight the potential of eQTL mapping as a high-resolution tool for studying the determinants of gene regulation.
Published: 2008
Full Text: View/download PDF

46. Linkage disequilibrium-based quality control for large-scale genetic studies.

Author: Paul Scheet and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: Quality control (QC) is a critical step in large-scale studies of genetic variation. While, on average, high-throughput single nucleotide polymorphism (SNP) genotyping assays are now very accurate, the errors that remain tend to cluster into a small percentage of "problem" SNPs, which exhibit unusually high error rates. Because most large-scale studies of genetic variation are searching for phenomena that are rare (e.g., SNPs associated with a phenotype), even this small percentage of problem SNPs can cause important practical problems. Here we describe and illustrate how patterns of linkage disequilibrium (LD) can be used to improve QC in large-scale, population-based studies. This approach has the advantage over existing filters (e.g., HWE or call rate) that it can actually reduce genotyping error rates by automatically correcting some genotyping errors. Applying this LD-based QC procedure to data from The International HapMap Project, we identify over 1,500 SNPs that likely have high error rates in the CHB and JPT samples and estimate corrected genotypes. Our method is implemented in the software package fastPHASE, available from the Stephens Lab website (http://stephenslab.uchicago.edu/software.html).
Published: 2008
Full Text: View/download PDF

47. Imputation-based analysis of association studies: candidate regions and quantitative traits.

Author: Bertrand Servin and Matthew Stephens
Subjects: Genetics, QH426-470
Abstract: We introduce a new framework for the analysis of association studies, designed to allow untyped variants to be more effectively and directly tested for association with a phenotype. The idea is to combine knowledge on patterns of correlation among SNPs (e.g., from the International HapMap project or resequencing data in a candidate region of interest) with genotype data at tag SNPs collected on a phenotyped study sample, to estimate ("impute") unmeasured genotypes, and then assess association between the phenotype and these estimated genotypes. Compared with standard single-SNP tests, this approach results in increased power to detect association, even in cases in which the causal variant is typed, with the greatest gain occurring when multiple causal variants are present. It also provides more interpretable explanations for observed associations, including assessing, for each SNP, the strength of the evidence that it (rather than another correlated SNP) is causal. Although we focus on association studies with quantitative phenotype and a relatively restricted region (e.g., a candidate gene), the framework is applicable and computationally practical for whole genome association studies. Methods described here are implemented in a software package, Bim-Bam, available from the Stephens Lab website http://stephenslab.uchicago.edu/software.html.
Published: 2007
Full Text: View/download PDF

48. Absence of the TAP2 human recombination hotspot in chimpanzees.

Author: Susan E Ptak, Amy D Roeder, Matthew Stephens, Yoav Gilad, Svante Pääbo, and Molly Przeworski
Subjects: Biology (General), QH301-705.5
Abstract: Recent experiments using sperm typing have demonstrated that, in several regions of the human genome, recombination does not occur uniformly but instead is concentrated in "hotspots" of 1-2 kb. Moreover, the crossover asymmetry observed in a subset of these has led to the suggestion that hotspots may be short-lived on an evolutionary time scale. To test this possibility, we focused on a region known to contain a recombination hotspot in humans, TAP2, and asked whether chimpanzees, the closest living evolutionary relatives of humans, harbor a hotspot in a similar location. Specifically, we used a new statistical approach to estimate recombination rate variation from patterns of linkage disequilibrium in a sample of 24 western chimpanzees (Pan troglodytes verus). This method has been shown to produce reliable results on simulated data and on human data from the TAP2 region. Strikingly, however, it finds very little support for recombination rate variation at TAP2 in the western chimpanzee data. Moreover, simulations suggest that there should be stronger support if there were a hotspot similar to the one characterized in humans. Thus, it appears that the human TAP2 recombination hotspot is not shared by western chimpanzees. These findings demonstrate that fine-scale recombination rates can change between very closely related species and raise the possibility that rates differ among human populations, with important implications for linkage-disequilibrium based association studies.
Published: 2004
Full Text: View/download PDF

50. Empirical Bayes Matrix Factorization.

Author: Wei Wang and Matthew Stephens
Published: 2021

Searchworks

Select search scope, currently: Articles Catalog books, media & more in Jio Institute collections Articles journal articles & other e-resources

Search

Search Constraints

Refine your results

Search Limiters

Topic

Publication Year Range

Language

Publication Type

Journal

Database

Publisher

296 results on '"Matthew Stephens"'

Search Results

Catalog

Select search scope, currently: Articles

Catalog

books, media & more in Jio Institute collections

Articles

journal articles & other e-resources