Back to Search
Start Over
Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration.
- Source :
-
Cluster Computing . Sep2020, Vol. 23 Issue 3, p2193-2204. 12p. - Publication Year :
- 2020
-
Abstract
- This paper presents a comprehensive suite of techniques for optimized memory management in multi-GPU systems to accelerate deep learning application execution. We employ a hybrid utilization of GPU and CPU memories in a multi-GPU environment by effectively addressing contention issues in the shared interconnect (e.g., PCIe, NVLink). In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that achieves the highest processing throughput while sustaining a large mini-batch size. We successfully implemented our optimization techniques on TensorFlow, and performed extensive experiments in various multi-GPU environments including traditional PCIe and the latest high-bandwidth interconnect, NVLink. Evaluation results show that our proposed scheme actually improves computing performance by decreasing the I/O bottleneck, and effectively increasing the mini-batch size without sacrificing overall training throughput. [ABSTRACT FROM AUTHOR]
Details
- Language :
- English
- ISSN :
- 13867857
- Volume :
- 23
- Issue :
- 3
- Database :
- Academic Search Index
- Journal :
- Cluster Computing
- Publication Type :
- Academic Journal
- Accession number :
- 145948961
- Full Text :
- https://doi.org/10.1007/s10586-019-02974-6