Back to Search Start Over

Round-Based Mechanism and Job Packing with Model-Similarity-Based Policy for Scheduling DL Training in GPU Cluster.

Authors :
Thanapol, Panissara
Lavangnananda, Kittichai
Leprévost, Franck
Glad, Arnaud
Schleich, Julien
Bouvry, Pascal
Source :
Applied Sciences (2076-3417); Mar2024, Vol. 14 Issue 6, p2349, 18p
Publication Year :
2024

Abstract

Graphics Processing Units (GPUs) are employed for their parallel processing capabilities, which are essential to train deep learning (DL) models with large datasets within a reasonable time. However, the diverse GPU architectures exhibit variability in training performance depending on DL models. Furthermore, factors such as the number of GPUs for distributed training and batch size significantly impact training efficiency. Addressing the variability in training performance and accounting for these influential factors are critical for optimising resource usage. This paper presents a scheduling policy for DL training tasks in a heterogeneous GPU cluster. It builds upon a model-similarity-based scheduling policy by implementing a round-based mechanism and job packing. The round-based mechanism allows the scheduler to adjust its scheduling decisions periodically, whereas job packing optimises GPU utilisation by fitting additional jobs into a GPU that trains a small model. Results show that implementing a round-based mechanism reduces the makespan by approximately 29%, compared to the scenario without it. Additionally, integrating job packing further decreases the makespan by 5%. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
20763417
Volume :
14
Issue :
6
Database :
Complementary Index
Journal :
Applied Sciences (2076-3417)
Publication Type :
Academic Journal
Accession number :
176271311
Full Text :
https://doi.org/10.3390/app14062349