1. Probabilistic inference of the genetic architecture underlying functional enrichment of complex traits
- Author
-
Lars Rönnegård, Athanasios Kousathanas, Gerhard Moser, Etienne J. Orliac, Alexander Holloway, Julia Sidorenko, Marion Patxot, Daniel Trejo Banos, Matthew R. Robinson, Peter M. Visscher, Sven Erik Ojavee, Reedik Mägi, and Zoltán Kutalik
- Subjects
Multifactorial Inheritance ,Computer and Information Sciences ,Statistical methods ,Genotype ,Science ,Bayesian probability ,General Physics and Astronomy ,Inference ,Computational biology ,Biology ,Bayesian inference ,Genome-wide association studies ,Article ,General Biochemistry, Genetics and Molecular Biology ,Body Mass Index ,Open Reading Frames ,Genetics (medical genetics to be 30107 and agricultural genetics to be 40402) ,Chromosome (genetic algorithm) ,Genetic variation ,Humans ,10203 Bioinformatics (Computational Biology) (applications to be 10610) ,Models, Statistical ,Multidisciplinary ,Genetic Variation ,Bayes Theorem ,Data- och informationsvetenskap ,Genomics ,General Chemistry ,Biobank ,Body Height ,Introns ,Regression ,Genetic architecture ,ComputingMethodologies_PATTERNRECOGNITION ,Phenotype ,Diabetes Mellitus, Type 2 ,Genetic Techniques ,Cardiovascular Diseases ,Software ,Genome-Wide Association Study - Abstract
We develop a Bayesian model (BayesRR-RC) that provides robust SNP-heritability estimation, an alternative to marker discovery, and accurate genomic prediction, taking 22 seconds per iteration to estimate 8.4 million SNP-effects and 78 SNP-heritability parameters in the UK Biobank. We find that only ≤10% of the genetic variation captured for height, body mass index, cardiovascular disease, and type 2 diabetes is attributable to proximal regulatory regions within 10kb upstream of genes, while 12-25% is attributed to coding regions, 32–44% to introns, and 22-28% to distal 10-500kb upstream regions. Up to 24% of all cis and coding regions of each chromosome are associated with each trait, with over 3,100 independent exonic and intronic regions and over 5,400 independent regulatory regions having ≥95% probability of contributing ≥0.001% to the genetic variance of these four traits. Our open-source software (GMRM) provides a scalable alternative to current approaches for biobank data., Improving inference in large-scale genetic data linked to electronic medical record data requires the development of novel computationally efficient regression methods. Here, the authors develop a Bayesian approach for association analyses to improve SNP-heritability estimation, discovery, fine-mapping and genomic prediction.
- Published
- 2021