1. RAPID detection of gene–gene interactions in genome-wide association studies
- Author
-
Vineet Bafna, Dumitru Brinza, Matthew D. Schultz, and Glenn Tesler
- Subjects
Statistics and Probability ,Genetics ,Euclidean space ,Gene Expression ,Genomics ,Single-nucleotide polymorphism ,Genome-wide association study ,Locus (genetics) ,Computational biology ,Biology ,Biochemistry ,Genome ,Original Papers ,Polymorphism, Single Nucleotide ,Computer Science Applications ,Computational Mathematics ,Computational Theory and Mathematics ,Gene interaction ,Databases, Genetic ,Molecular Biology ,Software ,Genetic association ,Genome-Wide Association Study - Abstract
Motivation: In complex disorders, independently evolving locus pairs might interact to confer disease susceptibility, with only a modest effect at each locus. With genome-wide association studies on large cohorts, testing all pairs for interaction confers a heavy computational burden, and a loss of power due to large Bonferroni-like corrections. Correspondingly, limiting the tests to pairs that show marginal effect at either locus, also has reduced power. Here, we describe an algorithm that discovers interacting locus pairs without explicitly testing all pairs, or requiring a marginal effect at each locus. The central idea is a mathematical transformation that maps ‘statistical correlation between locus pairs’ to ‘distance between two points in a Euclidean space’. This enables the use of geometric properties to identify proximal points (correlated locus pairs), without testing each pair explicitly. For large datasets (∼106 SNPs), this reduces the number of tests from 1012 to 106, significantly reducing the computational burden, without loss of power. The speed of the test allows for correction using permutation-based tests. The algorithm is encoded in a tool called Rapid (RApid Pair IDentification) for identifying paired interactions in case–control GWAS. Results: We validated Rapid with extensive tests on simulated and real datasets. On simulated models of interaction, Rapid easily identified pairs with small marginal effects. On the benchmark disease, datasets from The Wellcome Trust Case Control Consortium, Rapid ran in about 1 CPU-hour per dataset, and identified many significant interactions. In many cases, the interacting loci were known to be important for the disease, but were not individually associated in the genome-wide scan. Availability: http://bix.ucsd.edu/projects/rapid Contact: vbafna@cs.ucsd.edu Supplementary information: Supplementary data are available at Bioinformatics online.
- Published
- 2010