Efficient Multi-GPU Memory Management for Deep Learning Acceleration

Youngrang Kim,Jaehwan Lee,Jik-Soo Kim,Hyunseung Jei,Hongchan Roh

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)（2018）

引用 11|浏览17

暂无评分

摘要

In this paper, we propose a new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration. We extend the Nvidia's vDNN concept (a hybrid utilization of GPU and CPU memories) in a multi-GPU environment by effectively addressing PCIe-bus contention problems. In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that can achieve the highest processing throughput while sustaining a large min-batch size. For evaluation, we have implemented our memory usage optimization scheme on Tensorflow, the well-known machine learning library from Google, and performed extensive experiments in a multi-GPU testbed. Our evaluation results show that the proposed scheme can increase the mini-batch size by up to 60%, and improve the training throughput by up to 46.6% in a multi-GPU system.

查看译文

关键词

Convolutional neural network, GPGPU memory, Multi-GPU

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要