Back to Search Start Over

Evaluating execution time predictions on GPU kernels using an analytical model and machine learning techniques.

Authors :
Amaris, Marcos
Camargo, Raphael
Cordeiro, Daniel
Goldman, Alfredo
Trystram, Denis
Source :
Journal of Parallel & Distributed Computing. Jan2023, Vol. 171, p66-78. 13p.
Publication Year :
2023

Abstract

Predicting the performance of applications executed on GPUs is a great challenge and is essential for efficient job schedulers. There are different approaches to do this, namely analytical modeling and machine learning (ML) techniques. Machine learning requires large training sets and reliable features, nevertheless it can capture the interactions between architecture and software without manual intervention. In this paper, we compared a BSP-based analytical model to predict the time of execution of kernels executed over GPUs. The comparison was made using three different ML techniques. The analytical model is based on the number of computations and memory accesses of the GPU, with additional information on cache usage obtained from profiling. The ML techniques Linear Regression, Support Vector Machine, and Random Forest were evaluated over two scenarios: first, data input or features for ML techniques were the same as the analytical model and, second, using a process of feature extraction, which used correlation analysis and hierarchical clustering. Our experiments were conducted with 20 CUDA kernels, 11 of which belonged to 6 real-world applications of the Rodinia benchmark suite, and the other were classical matrix-vector applications commonly used for benchmarking. We collected data over 9 NVIDIA GPUs in different machines. We show that the analytical model performs better at predicting when applications scale regularly. For the analytical model a single parameter λ is capable of adjusting the predictions, minimizing the complex analysis in the applications. We show also that ML techniques obtained high accuracy when a process of feature extraction is implemented. Sets of 5 and 10 features were tested in two different ways, for unknown GPUs and for unknown Kernels. For ML experiments with a process of feature extractions, we got errors around 1.54% and 2.71%, for unknown GPUs and for unknown Kernels, respectively. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
07437315
Volume :
171
Database :
Academic Search Index
Journal :
Journal of Parallel & Distributed Computing
Publication Type :
Academic Journal
Accession number :
159797367
Full Text :
https://doi.org/10.1016/j.jpdc.2022.09.002