1. Modeling and predicting performance of high performance computing applications on hardware accelerators
- Author
-
Mitesh R. Meswani, Laura Carrington, Scott B. Baden, Allan Snavely, Stephen W. Poole, and Didem Unat
- Subjects
TOP500 ,Speedup ,Computer science ,business.industry ,Graphics processing unit ,Workload ,Parallel computing ,Supercomputer ,Porting ,Theoretical Computer Science ,Hardware and Architecture ,Performance prediction ,Central processing unit ,business ,Field-programmable gate array ,Software ,Computer hardware - Abstract
Computers with hardware accelerators, also referred to as hybrid-core systems, speedup applications by offloading certain compute operations that can run faster on accelerators. Thus, it is not surprising that many of top500 supercomputers use accelerators. However, in addition to procurement cost, significant programming and porting effort is required to realize the potential benefit of such accelerators. Hence, before building such a system it is prudent to answer the question 'what is the projected performance benefit from accelerators for the workloads of interest?'. We address this question by way of a performance-modeling framework that predicts realizable application performance on accelerators rapidly and accurately without going to the considerable effort of porting and tuning. The modeling framework first automatically identifies commonly found compute patterns in scientific applications which we term idioms, which may benefit by accelerator technology. Next the framework models the predicted speedup of those idioms if they were to be ported to and run on hardware accelerators. As a proof of concept we characterize two kinds of accelerators 1) the FPGA accelerators on a Convey HC-1 system and 2) an NVIDIA FERMI GPU accelerator. We model performance of the idioms gather/scatter and stream and our predictions show that where these occur in two full-scale HPC applications, Milc and HYCOM, gather/scatter speeds up by as much as 15X, and stream by as much as 14X, whereas the overall compute time of Milc improves by 3.4% and HYCOM by 20%.
- Published
- 2012
- Full Text
- View/download PDF