Back to Search
Start Over
A comprehensive analysis comparing linear and generalized linear models in detecting adaptive SNPs
- Source :
- Molecular Ecology Resources. 21:733-744
- Publication Year :
- 2021
- Publisher :
- Wiley, 2021.
-
Abstract
- To understand how organisms adapt to their environment, a gene-environmental association (GEA) analysis is commonly conducted. GEA methods based on mixed models, such as linear latent factor mixed models (LFMM) and LFMM2, have grown in popularity for their robust performance in terms of power and computational speed. However, it is unclear how the assumption of a Gaussian distribution for the response variables influences model performance. In this paper, we develop a generalized linear model (GLM) that allows for non-Gaussian distribution in the genotypic response variables, and treatment of multiallelic nucleotide polymorphisms. Moreover, this multinomial logistic regression model (MLR) is combined with an admixture-based model or principal components analysis to correct for population structure (MLR-ADM and MLR-PC). Using simulations, we evaluate the type 1 error, false discovery rates (FDR), and power to detect selected SNPs, to guide model choice and best practices. With genomic control, MLR-PC and LFMM2 have similar type 1 error, FDRs, and power when analysing biallelic SNPs, while dramatically outperforming models not accounting for population structure. Differences in performance occur under continuous population structure where MLR-PC outperforms LFMM/LFMM2, especially when a larger number of clusters or triallelic SNPs are analysed. The Human Genome Diversity Project (HGDP) data set shows that both MLR-PC and LFMM2 control the inflation of P -values. Analysis of the 1,000 Genome Project Phase 3 data set illustrates that MLR-PC and LFMM2 produce consistent results for most significant SNPs, while MLR-PC discovered additional SNPs corresponding to certain genes, suggesting MLR-PC may be a useful alternative to GEA inference.
- Subjects :
- Mixed model
Generalized linear model
Principal Component Analysis
Genotype
Models, Genetic
Gaussian
Genome project
Biology
Polymorphism, Single Nucleotide
Data set
symbols.namesake
Statistics
Principal component analysis
Linear Models
Genetics
symbols
Ecology, Evolution, Behavior and Systematics
Biotechnology
Type I and type II errors
Local adaptation
Subjects
Details
- ISSN :
- 17550998 and 1755098X
- Volume :
- 21
- Database :
- OpenAIRE
- Journal :
- Molecular Ecology Resources
- Accession number :
- edsair.doi.dedup.....84230d75f245ac909bf2af404e0f81a9
- Full Text :
- https://doi.org/10.1111/1755-0998.13298