Start Over

Significance testing for small annotations in stratified LD-Score regression

Authors :: Luke J. O’Connor
Neale Bm
Katherine Tashman
Hilary K. Finucane
Ran Cui
Publication Year :: 2021
Publisher :: Cold Spring Harbor Laboratory, 2021.
Abstract: S-LDSC is a widely used heritability enrichment method that has helped gain biological insights into numerous complex traits. It has primarily been used to analyze large annotations that contain approximately 0.5% of SNPs or more. Here, we show in simulation that, when applied to small annotations, the block jackknife-based significance testing used in S-LDSC does not always control type 1 error. We show that the inflation of type 1 error for small annotations is due both to the noisiness of the jackknife estimate of the standard error and to the non-normality of the regression coefficient estimates. We use the percent of 0.01 centimorgan blocks in the genome overlapped by the annotation to quantify the size of an annotation and the extent to which the SNPs in the annotation cluster together, and we find thresholds on this value above which type 1 error is controlled. We have implemented a test in the LDSC software that informs users when they compute LD scores for an annotation if the annotation does not pass the threshold for producing controlled type 1 error.Author SummaryGenetics is a rapidly evolving field that allows us to link our genetic code to the physiological manifestations of disease. A key part of this work is finding regions of the genome that contribute disproportionately to the genetic underpinnings of a disease. A commonly used tool to provide such insight is stratified LD score regression (S-LDSC). S-LDSC allows us to estimate how much a set of genomic regions contributes to the overall heritability of a phenotype, and to test whether this is more than we would expect by chance. Here we show that when we apply S-LDSC to a small set of genomic regions, it does not give an accurate test of whether this set of genomic regions contributes more than we would expect by chance to the phenotype. We characterize what it means to be a “small” set of genomic regions, and we set thresholds to restrict which annotations we test to prevent false positive results.This helps to ensure that as we continue to pursue genetic analyses at scale, we report only truly significant results that will help us further understand the etiology of many of the traits we study.