Back to Search Start Over

Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration.

Authors :
Kim, Youngrang
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
Source :
Cluster Computing. Sep2020, Vol. 23 Issue 3, p2193-2204. 12p.
Publication Year :
2020

Abstract

This paper presents a comprehensive suite of techniques for optimized memory management in multi-GPU systems to accelerate deep learning application execution. We employ a hybrid utilization of GPU and CPU memories in a multi-GPU environment by effectively addressing contention issues in the shared interconnect (e.g., PCIe, NVLink). In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that achieves the highest processing throughput while sustaining a large mini-batch size. We successfully implemented our optimization techniques on TensorFlow, and performed extensive experiments in various multi-GPU environments including traditional PCIe and the latest high-bandwidth interconnect, NVLink. Evaluation results show that our proposed scheme actually improves computing performance by decreasing the I/O bottleneck, and effectively increasing the mini-batch size without sacrificing overall training throughput. [ABSTRACT FROM AUTHOR]

Details

Language :
English
ISSN :
13867857
Volume :
23
Issue :
3
Database :
Academic Search Index
Journal :
Cluster Computing
Publication Type :
Academic Journal
Accession number :
145948961
Full Text :
https://doi.org/10.1007/s10586-019-02974-6