Back to Search
Start Over
Interference-aware execution framework with Co-scheML on GPU clusters.
- Source :
-
Cluster Computing . Oct2023, Vol. 26 Issue 5, p2577-2589. 13p. - Publication Year :
- 2023
-
Abstract
- Recently, improving the overall resource utilization through efficient scheduling of applications on graphic processing unit (GPU) clusters has been a concern. Traditional cluster-orchestration platforms providing GPUs exclusively for applications constrain high resource utilization. Co-execution of GPU applications is suggested to utilize limited resources. However, the co-execution of GPU applications without considering their diverse characteristics can lead to their unpredictable performances owing to interference resulting from contention and unbalanced usage of resources among applications. This paper proposes an interference-aware execution framework with Co-scheML for various GPU applications such as high performance computing (HPC), deep learning (DL) training, and DL inference. Various resource-usage characteristics of GPU applications are analyzed and profiled to identify various degrees of their application interference. As interference prediction is challenging owing to the complexity of GPU systems, an interference model is generated by applying defined GPU metrics to machine learning (ML) models. A Co-scheML scheduler deploys applications to minimize the interference using the predicted interference from the constructed model. Experimental results of our framework demonstrated that the resource utilization improved by 24%, the average job completion time (JCT) improved by 23%, and the makespan shortened by 22% on average, compared to baseline schedulers. [ABSTRACT FROM AUTHOR]
- Subjects :
- *HIGH performance computing
*DEEP learning
*MACHINE learning
*PRODUCTION scheduling
Subjects
Details
- Language :
- English
- ISSN :
- 13867857
- Volume :
- 26
- Issue :
- 5
- Database :
- Academic Search Index
- Journal :
- Cluster Computing
- Publication Type :
- Academic Journal
- Accession number :
- 170716710
- Full Text :
- https://doi.org/10.1007/s10586-021-03299-z