Back to Search Start Over

Implementation of a global GPU management plugin for Slurm

Authors :
Xue Wu
Xiang Long
Source :
2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT).
Publication Year :
2017
Publisher :
IEEE, 2017.

Abstract

Slurm is a widely used resource management software for Linux cluster. It has several CPU selection plugins with different allocation strategies suitable for different scenarios. But the GPU allocation is constrained by the selected CPU's location because GPUs can only be accessed by the process running on the same node. This restriction may cause job waiting for GPUs even if there are some free GPUs in the cluster. This paper presents a global GPU management plugin for Slurm. The plugin using remote GPU virtualization method detaches the GPUs to form a global GPU pool and decouples the GPU allocation procedure from the CPU's. GPUs in the pool are available to CUDA jobs on any node in the cluster. Furthermore, we implement two GPU selection strategy, best fit and local first. Experiments show the global GPU management plugin shorter the job's waiting time and makes efficient use of GPUs in the cluster.

Details

Database :
OpenAIRE
Journal :
2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)
Accession number :
edsair.doi...........90f8ce039e63a18eda5cb0cc2cb25ad1