Back to Search Start Over

Architecture-based and target-oriented algorithm optimization of high-order methods via complete-search tensor contraction.

Authors :
You, Hojun
Kim, Chongam
Source :
Computer Physics Communications. Jul2021, Vol. 264, pN.PAG-N.PAG. 1p.
Publication Year :
2021

Abstract

Sophisticated solution algorithms, along with complex data structures, are known as the main barriers that hinder high-order methods from being actively embraced by industry and academia. Simultaneously, modern computing machines offer a wide variety of opportunities to enhance the performance of solution algorithms through highly tuned computational kernels. To address this issue, we present an architecture-based and target-oriented algorithm optimization for high-order methods, called complete-search tensor contraction (CsTC). The key idea of CsTC is to convert the tensor operations of a high-order method into an optimization problem, which leads to finding an optimized method to execute tensor contraction (TC). After introducing the general framework of CsTC, it was applied to the discontinuous Galerkin (DG) discretization. An approach based on general matrix multiplication (GEMM) is adopted because of its flexibility to handle the intermediate order of TC and the reusability of state-of-the-art GEMM primitives. By optimizing data structures as well as TC operations, CsTC provides an optimized solution algorithm that performs significantly better than the original non-optimized high-order method. The entire optimization process is automatically completed in a few minutes at a pre-processing step on a computer. The proposed CsTC optimization fully reflects the mesh and solution parameters adopted as well as the computing architecture used, thus, it is completely target-oriented and architecture-based. Various solution parameters and computing architectures are used and compared. All the results indicate that the optimization is essential to extract the best performance from a given computing architecture and that the performance enhancement becomes substantial as the DG approximation order increases and as a more recent processor is employed. Finally, a 3-D viscous flow problem governed by the compressible Navier-Stokes equations is solved. The optimized algorithm yields more than 10× speed-up compared to the algorithm with a nested-loop approach when DG- P3 and DG- P5 approximations are used. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
00104655
Volume :
264
Database :
Academic Search Index
Journal :
Computer Physics Communications
Publication Type :
Periodical
Accession number :
150769768
Full Text :
https://doi.org/10.1016/j.cpc.2021.107988