1. Significant Sparse Polygenic Risk Scores across 813 traits in UK Biobank
- Author
-
Robert Tibshirani, Trevor Hastie, Manuel A. Rivas, Johanne Marie Justesen, Junyang Qian, Ruilin Li, Yosuke Tanigawa, and Guhan Venkataraman
- Subjects
Genetics ,Correlation ,Genotype ,Transferability ,Principal component analysis ,Polygenic risk score ,Biology ,Quantitative trait locus ,Biobank ,Phenotype - Abstract
We present a systematic assessment of polygenic risk score (PRS) prediction across more than 1,500 traits using genetic and phenotype data in the UK Biobank. We report 813 sparse PRS models with significant (p < 2.5 × 10−5) incremental predictive performance when compared against the covariate-only model that considers age, sex, types of genotyping arrays, and the principal component loadings of genotypes. We report a significant correlation between the number of genetic variants selected in the sparse PRS model and the incremental predictive performance (Spearman’s ρ = 0.61, p = 2.2 × 10−59 for quantitative traits, ρ = 0.21, p = 9.6 × 10−4 for binary traits). The sparse PRS model trained on European individuals showed limited transferability when evaluated on non-European individuals in the UK Biobank. We provide the PRS model weights on the Global Biobank Engine (https://biobankengine.stanford.edu/prs).Author summaryPolygenic risk scores (PRSs), an approach to estimate genetic predisposition on disease liability by aggregating the effects across multiple genetic variants, has attracted increasing research interest. While there have been improvements in the predictive performance of PRS for some traits, the applicability of PRS models across a wide range of human traits has not been clear. Here, applying penalized regression using Batch Screening Iterative Lasso (BASIL) algorithm to more than 269,000 individuals of white British ancestry in UK Biobank, we systematically characterize PRS models across more than 1,500 traits. We report 813 traits with PRS models of statistically significant predictive performance. While the statistical significance does not necessarily directly translate into clinical relevance, we investigate the properties of the 813 significant PRS models and report a significant correlation between predictive performance and estimated SNP-based heritability. We find that the number of genetic variants selected in our sparse PRS model is significantly correlated with the incremental predictive performance in both quantitative and binary traits. Our transferability assessment of PRS models in UK Biobank revealed that the sparse PRS models trained on individuals of European ancestry had a lower predictive performance for individuals of African and Asian ancestry groups.
- Published
- 2021