Back to Search Start Over

A log-ratio biplot approach for exploring genetic relatedness based on identity by state

Authors :
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
Universitat Politècnica de Catalunya. COSDA-UPC - COmpositional and Spatial Data Analysis
Graffelman, Jan
Galván Femenía, Iván
de Cid, Rafael
Barceló Vidal, Carles
Universitat Politècnica de Catalunya. Departament d'Estadística i Investigació Operativa
Universitat Politècnica de Catalunya. COSDA-UPC - COmpositional and Spatial Data Analysis
Graffelman, Jan
Galván Femenía, Iván
de Cid, Rafael
Barceló Vidal, Carles
Publication Year :
2019

Abstract

The detection of cryptic relatedness in large population-based cohorts is of great importance in genome research. The usual approach for detecting closely related individuals is to plot allele sharing statistics, based on identity-by-state or identity-by-descent, in a two-dimensional scatterplot. This approach ignores that allele sharing data across individuals has in reality a higher dimensionality, and neither regards the compositional nature of the underlying counts of shared genotypes. In this paper we develop biplot methodology based on log-ratio principal component analysis that overcomes these restrictions. This leads to entirely new graphics that are essentially useful for exploring relatedness in genetic databases from homogeneous populations. The proposed method can be applied in an iterative manner, acting as a looking glass for more remote relationships that are harder to classify. Datasets from the 1,000 Genomes Project and the Genomes For Life-GCAT Project are used to illustrate the proposed method. The discriminatory power of the log-ratio biplot approach is compared with the classical plots in a simulation study. In a non-inbred homogeneous population the classification rate of the log-ratio principal component approach outperforms the classical graphics across the whole allele frequency spectrum, using only identity by state. In these circumstances, simulations show that with 35,000 independent bi-allelic variants, log-ratio principal component analysis, combined with discriminant analysis, can correctly classify relationships up to and including the fourth degree<br />Postprint (published version)

Details

Database :
OAIster
Notes :
16 p., application/pdf, English
Publication Type :
Electronic Resource
Accession number :
edsoai.on1110007270
Document Type :
Electronic Resource