Back to Search Start Over

Many-BSP: an analytical performance model for CUDA kernels.

Authors :
Riahi, Ali
Savadi, Abdorreza
Naghibzadeh, Mahmoud
Source :
Computing. May2024, Vol. 106 Issue 5, p1519-1555. 37p.
Publication Year :
2024

Abstract

The unknown behavior of GPUs and the differing characteristics among their generations present a serious challenge in the analysis and optimization of programs in these processors. As a result, performance models have been developed to better analyze and describe the behavior of these processors. These models help programmers to configure applications and developers to improve the performance of these devices. This paper introduces an analytical model, called Many-BSP, to predict the execution time of a CUDA kernel. This model has high portability and can easily be used on various devices. There are many GPU features and behaviors that affect performance and will be discussed, including multi-threading, coalesced access to global memory, shared memory bank conflict, dual-issue instructions, limitation of functional units, parallelism in instruction, thread and warp levels, the instruction pipeline, branch divergence, and intra-block and inter-block overlapping between communications and computations. This model also employs the tree hierarchy and parameters of the Multi-BSP model to estimate the communication latency with memory. In Many-BSP, the execution time of a kernel is predicted by static analysis of CUDA and PTX codes. The performance of the model is tested on three devices of different generations and three real-world benchmarks. The results show that the execution time of a CUDA kernel can be predicted with a maximum error of 12.33%. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
0010485X
Volume :
106
Issue :
5
Database :
Academic Search Index
Journal :
Computing
Publication Type :
Academic Journal
Accession number :
177775087
Full Text :
https://doi.org/10.1007/s00607-023-01255-w