1. Predicting and reining in application-level slowdown on spatial multitasking GPUs.
- Author
-
Wei, Mengze, Zhao, Wenyi, Chen, Quan, Dai, Hao, Leng, Jingwen, Li, Chao, Zheng, Wenli, and Guo, Minyi
- Subjects
- *
COMPUTER multitasking , *FORECASTING , *KERNEL functions , *PREDICTION models , *MODEL railroads , *PRIOR learning - Abstract
Predicting performance degradation of a GPU application at co-location on a spatial multitasking GPU without prior application knowledge is essential in public Clouds. Prior work mainly targets CPU co-location, and is inaccurate and/or inefficient for predicting performance of applications at co-location on spatial multitasking GPUs. Our investigation shows that hardware event statistics caused by co-located applications strongly correlate with their slowdowns. Based on this observation, we present Themis with a kernel slowdown model (Themis-KSM), which performs precise and efficient online application slowdown prediction without prior application knowledge. The kernel slowdown model is trained offline. When new applications co-run, Themis-KSM collects event statistics and predicts their slowdowns simultaneously. In addition, we also propose a two-stage slowdown prediction mechanism (Themis-TSP) for real-system GPUs without any hardware modification. Our evaluation shows that Themis has negligible runtime overhead, and both Themis-KSM and Themis-TSP can precisely predict application-level slowdown with prediction error smaller than 9.5% and 12.8%, respectively. Based on Themis, we also implement an SM allocation engine to rein in application slowdown at co-location. Case studies show that the engine successfully enforces fair sharing and QoS. • Themis predicts and reins in application-level slowdown on spatial multitasking GPUs. • The kernel slowdown model (KSM) collects hardware event statistics by per-SM counters and predicts slowdown with a pre-trained neural network. • The two-stage slowdown prediction model (TSP) predicts slowdown on real-system GPUs, by using a two-stage piecewise function to approximate kernels' SM-IPC curves. • The SM allocation engine reins in application slowdown to fulfill user requirements such as fairness or QoS. [ABSTRACT FROM AUTHOR]
- Published
- 2020
- Full Text
- View/download PDF