Back to Search Start Over

An Automated Tool for Analysis and Tuning of GPU-Accelerated Code in HPC Applications.

Authors :
Zhou, Keren
Meng, Xiaozhu
Sai, Ryuichi
Grubisic, Dejan
Mellor-Crummey, John
Source :
IEEE Transactions on Parallel & Distributed Systems. Apr2022, Vol. 33 Issue 4, p854-865. 12p.
Publication Year :
2022

Abstract

The US Department of Energy’s fastest supercomputers and forthcoming exascale systems employ Graphics Processing Units (GPUs) to increase the computational performance of compute nodes. However, the complexity of GPU architectures makes tailoring sophisticated applications to achieve high performance on GPU-accelerated systems a major challenge. At best, prior performance tools for GPU code only provide coarse-grained tuning advice at the kernel level. In this article, we describe GPA, a performance advisor that suggests potential code optimizations at a hierarchy of levels, including individual lines, loops, and functions. To gather the fine-grained measurements needed to produce such insights, GPA uses instruction sampling and binary instrumentation to monitor execution of GPU code. At the time of this writing, GPU instruction sampling is only available on NVIDIA GPUs. To understand performance losses, GPA uses data flow analysis to approximately attribute measured instruction stalls back to their causes. GPA then analyzes patterns of stalls using information about a program’s structure and the GPU architecture to identify optimization strategies that address inefficiencies observed. GPA then employs detailed performance models to estimate the potential speedup that each optimization might provide. Experiments with benchmarks and applications show that GPA provides useful advice for tuning GPU code. We applied GPA to analyze and tune a collection of codes on NVIDIA V100 and A100 GPUs. GPA suggested optimizations that it estimates will accelerate performance across the set of codes by a geometric mean of 1.21×. Applying these optimizations suggested by GPA accelerated these codes by a geometric mean of 1.19×. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
10459219
Volume :
33
Issue :
4
Database :
Academic Search Index
Journal :
IEEE Transactions on Parallel & Distributed Systems
Publication Type :
Academic Journal
Accession number :
153880611
Full Text :
https://doi.org/10.1109/TPDS.2021.3094169