Comprehensive techniques of multi-GPU memory optimization for deep learning acceleration.

Authors :: Kim, Youngrang
Lee, Jaehwan
Kim, Jik-Soo
Jei, Hyunseung
Roh, Hongchan
Source :: Cluster Computing. Sep2020, Vol. 23 Issue 3, p2193-2204. 12p.
Publication Year :: 2020
Abstract: This paper presents a comprehensive suite of techniques for optimized memory management in multi-GPU systems to accelerate deep learning application execution. We employ a hybrid utilization of GPU and CPU memories in a multi-GPU environment by effectively addressing contention issues in the shared interconnect (e.g., PCIe, NVLink). In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that achieves the highest processing throughput while sustaining a large mini-batch size. We successfully implemented our optimization techniques on TensorFlow, and performed extensive experiments in various multi-GPU environments including traditional PCIe and the latest high-bandwidth interconnect, NVLink. Evaluation results show that our proposed scheme actually improves computing performance by decreasing the I/O bottleneck, and effectively increasing the mini-batch size without sacrificing overall training throughput. [ABSTRACT FROM AUTHOR]

Subjects :: *GRAPHICS processing units
*DEEP learning
*CONVOLUTIONAL neural networks
*MEMORY
*MATHEMATICAL optimization
*ALGORITHMS

Full Text Access

Tools