Back to Search Start Over

GRIEVOUS: your command-line general for resolving cross-dataset genotype inconsistencies.

Authors :
Talwar JV
Klie A
Pagadala MS
Carter H
Source :
Bioinformatics (Oxford, England) [Bioinformatics] 2024 Aug 02; Vol. 40 (8).
Publication Year :
2024

Abstract

Summary: Harmonizing variant indexing and allele assignments across datasets is crucial for data integrity in cross-dataset studies such as multi-cohort genome-wide association studies, meta-analyses, and the development, validation, and application of polygenic risk scores. Ensuring this indexing and allele consistency is a laborious, time-consuming, and error-prone process requiring a certain degree of computational proficiency. Here, we introduce GRIEVOUS, a command-line tool for cross-dataset variant homogenization. By means of an internal database and a custom indexing methodology, GRIEVOUS identifies, formats, and aligns all biallelic single nucleotide polymorphisms (SNPs) across all summary statistic and genotype files of interest. Upon completion of dataset harmonization, GRIEVOUS can also be used to extract the maximal set of biallelic SNPs common to all datasets.<br />Availability and Implementation: GRIEVOUS and all supporting documentation and tutorials can be found at https://github.com/jvtalwar/GRIEVOUS. It is freely and publicly available under the MIT license and can be installed via pip.<br /> (© The Author(s) 2024. Published by Oxford University Press.)

Details

Language :
English
ISSN :
1367-4811
Volume :
40
Issue :
8
Database :
MEDLINE
Journal :
Bioinformatics (Oxford, England)
Publication Type :
Academic Journal
Accession number :
39078222
Full Text :
https://doi.org/10.1093/bioinformatics/btae489