Back to Search
Start Over
Computationally efficient whole-genome regression for quantitative and binary traits
- Source :
- Nature Genetics. 53:1097-1103
- Publication Year :
- 2021
- Publisher :
- Springer Science and Business Media LLC, 2021.
-
Abstract
- Genome-wide association analysis of cohorts with thousands of phenotypes is computationally expensive, particularly when accounting for sample relatedness or population structure. Here we present a novel machine-learning method called REGENIE for fitting a whole-genome regression model for quantitative and binary phenotypes that is substantially faster than alternatives in multi-trait analyses while maintaining statistical efficiency. The method naturally accommodates parallel analysis of multiple phenotypes and requires only local segments of the genotype matrix to be loaded in memory, in contrast to existing alternatives, which must load genome-wide matrices into memory. This results in substantial savings in compute time and memory usage. We introduce a fast, approximate Firth logistic regression test for unbalanced case–control phenotypes. The method is ideally suited to take advantage of distributed computing frameworks. We demonstrate the accuracy and computational benefits of this approach using the UK Biobank dataset with up to 407,746 individuals. REGENIE is a whole-genome regression method based on ridge regression that enables highly parallelized analysis of quantitative and binary traits in biobank-scale data with reduced computational requirements.
- Subjects :
- Genotype
Binary number
Sample (statistics)
Biology
computer.software_genre
Logistic regression
Machine Learning
03 medical and health sciences
0302 clinical medicine
Software
Genetics
Humans
030304 developmental biology
0303 health sciences
business.industry
Computational Biology
Reproducibility of Results
Contrast (statistics)
Regression analysis
Genomics
Regression
Logistic Models
Phenotype
ComputingMethodologies_PATTERNRECOGNITION
Efficiency
Case-Control Studies
Data mining
business
computer
030217 neurology & neurosurgery
Genome-Wide Association Study
Subjects
Details
- ISSN :
- 15461718 and 10614036
- Volume :
- 53
- Database :
- OpenAIRE
- Journal :
- Nature Genetics
- Accession number :
- edsair.doi.dedup.....203a8303d00e057d517ab7516becf642