Back to Search Start Over

Parallel algorithm design and optimization of geodynamic numerical simulation application on the Tianhe new-generation high-performance computer.

Authors :
Yang, Jin
Yang, Wangdong
Qi, Ruixuan
Tsai, Qinyun
Lin, Shengle
Dong, Fengkun
Li, Kenli
Li, Keqin
Source :
Journal of Supercomputing. Jan2024, Vol. 80 Issue 1, p331-362. 32p.
Publication Year :
2024

Abstract

CitcomCu is a numerical simulation software for mantle convection in the field of geodynamics, which can simulate thermo-chemical convection in a three-dimensional domain. Due to the increasing demand for high-precision simulations and larger application scales, larger-scale computing systems are needed to solve this problem. However, the parallel efficiency of CitcomCu on large-scale heterogeneous parallel computing systems is difficult to improve, especially it cannot adapt to the current mainstream heterogeneous high-performance computing architecture with CPUs and accelerators. In this paper, we propose an geodynamics numerical simulation parallel computing framework using heterogeneous computing architecture based on the Tianhe new-generation high-performance computer. Firstly, the data partitioning mode of CitcomCu was optimized based on the large-scale heterogeneous computing architecture to reduce the overall communication overhead. Secondly, the iterative solution algorithm of CitcomCu was improved to speed up the solution process. Finally, the NEON instruction set based on SIMD is used for the sparse matrix operations in the solution process to improve parallel efficiency. Based on our parallel computing framework, the optimized CitcomCu was deployed and tested on the Tianhe new-generation high-performance computer. Experimental data showed that the performance of the optimized program was 3.3975 times higher than that of the unoptimized program on a single node. Compared with 50,000 computational cores, the parallel efficiency of the unoptimized program on one million computational cores was 36.75%, while the parallel efficiency of the optimized program was improved by 16.22% and reached 42.71%. In addition, the optimized program can be executed on 40 million computational cores, with a parallel efficiency of 36.54%. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
09208542
Volume :
80
Issue :
1
Database :
Academic Search Index
Journal :
Journal of Supercomputing
Publication Type :
Academic Journal
Accession number :
174659419
Full Text :
https://doi.org/10.1007/s11227-023-05469-9