Back to Search
Start Over
Implementation of a global GPU management plugin for Slurm
- Source :
- 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT).
- Publication Year :
- 2017
- Publisher :
- IEEE, 2017.
-
Abstract
- Slurm is a widely used resource management software for Linux cluster. It has several CPU selection plugins with different allocation strategies suitable for different scenarios. But the GPU allocation is constrained by the selected CPU's location because GPUs can only be accessed by the process running on the same node. This restriction may cause job waiting for GPUs even if there are some free GPUs in the cluster. This paper presents a global GPU management plugin for Slurm. The plugin using remote GPU virtualization method detaches the GPUs to form a global GPU pool and decouples the GPU allocation procedure from the CPU's. GPUs in the pool are available to CUDA jobs on any node in the cluster. Furthermore, we implement two GPU selection strategy, best fit and local first. Experiments show the global GPU management plugin shorter the job's waiting time and makes efficient use of GPUs in the cluster.
- Subjects :
- Computer science
Node (networking)
0102 computer and information sciences
02 engineering and technology
Parallel computing
GPU cluster
Software_PROGRAMMINGTECHNIQUES
Virtualization
computer.software_genre
01 natural sciences
020202 computer hardware & architecture
CUDA
010201 computation theory & mathematics
Computer cluster
0202 electrical engineering, electronic engineering, information engineering
Operating system
Resource management
Plug-in
Central processing unit
computer
ComputingMethodologies_COMPUTERGRAPHICS
Subjects
Details
- Database :
- OpenAIRE
- Journal :
- 2017 3rd International Conference on Computational Intelligence & Communication Technology (CICT)
- Accession number :
- edsair.doi...........90f8ce039e63a18eda5cb0cc2cb25ad1