Back to Search Start Over

GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species

Authors :
Liubin Zhang
Yangyang Yuan
Wenjie Peng
Bin Tang
Mulin Jun Li
Hongsheng Gui
Qiang Wang
Miaoxin Li
Source :
Genome Biology, Vol 24, Iss 1, Pp 1-22 (2023)
Publication Year :
2023
Publisher :
BMC, 2023.

Abstract

Abstract Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.

Details

Language :
English
ISSN :
1474760X
Volume :
24
Issue :
1
Database :
Directory of Open Access Journals
Journal :
Genome Biology
Publication Type :
Academic Journal
Accession number :
edsdoj.4f6f3be3aab4bcba4e0de1f78462441
Document Type :
article
Full Text :
https://doi.org/10.1186/s13059-023-02906-z