1. Multi-level Parallelization of Genotype Imputation on Supercomputers
- Author
-
Weiwen Zhang, Juniarto Samsudin, Yongqing Zhu, Theint Theint Aye, and Long Wang
- Subjects
Computational Mathematics ,Genotype imputation ,Computer science ,Genetics ,Parallel computing ,Molecular Biology ,Biochemistry - Abstract
Background: Genotype imputation as a service is developed to enable researchers to estimate genotypes on haplotyped data without performing whole genome sequencing. However, genotype imputation is computation intensive and thus it remains a challenge to satisfy the high performance requirement of genome wide association study (GWAS). Objective: In this paper, we propose a high performance computing solution for genotype imputation on supercomputers to enhance its execution performance. Method: We design and implement a multi-level parallelization that includes job level, process level and thread level parallelization, enabled by job scheduling management, message passing interface (MPI) and OpenMP, respectively. It involves job distribution, chunk partition and execution, parallelized iteration for imputation and data concatenation. Due to the design of multi-level parallelization, we can exploit the multi-machine/multi-core architecture to improve the performance of genotype imputation. Results: Experiment results show that our proposed method can outperform the Hadoop-based implementation of genotype imputation. Moreover, we conduct the experiments on supercomputers to evaluate the performance of the proposed method. The evaluation shows that it can significantly shorten the execution time, thus improving the performance for genotype imputation. Conclusion: The proposed multi-level parallelization, when deployed as an imputation as a service, will facilitate bioinformatics researchers in Singapore to conduct genotype imputation and enhance the association study.
- Published
- 2020
- Full Text
- View/download PDF