1. GBC: a parallel toolkit based on highly addressable byte-encoding blocks for extremely large-scale genotypes of species
- Author
-
Liubin Zhang, Yangyang Yuan, Wenjie Peng, Bin Tang, Mulin Jun Li, Hongsheng Gui, Qiang Wang, and Miaoxin Li
- Subjects
Large-scale genotypes ,Genotype compression ,Highly addressable genotype blocks ,Byte-encoding genotypes ,Genotype management ,Parallelization algorithm ,Biology (General) ,QH301-705.5 ,Genetics ,QH426-470 - Abstract
Abstract Whole -genome sequencing projects of millions of subjects contain enormous genotypes, entailing a huge memory burden and time for computation. Here, we present GBC, a toolkit for rapidly compressing large-scale genotypes into highly addressable byte-encoding blocks under an optimized parallel framework. We demonstrate that GBC is up to 1000 times faster than state-of-the-art methods to access and manage compressed large-scale genotypes while maintaining a competitive compression ratio. We also showed that conventional analysis would be substantially sped up if built on GBC to access genotypes of a large population. GBC’s data structure and algorithms are valuable for accelerating large-scale genomic research.
- Published
- 2023
- Full Text
- View/download PDF