Efficient Multi-GPU Memory Management for Deep Learning Acceleration

2018 IEEE 3rd International Workshops on Foundations and Applications of Self* Systems (FAS*W)(2018)

引用 11|浏览17
暂无评分
摘要
In this paper, we propose a new optimized memory management scheme that can improve the overall GPU memory utilization in multi-GPU systems for deep learning application acceleration. We extend the Nvidia's vDNN concept (a hybrid utilization of GPU and CPU memories) in a multi-GPU environment by effectively addressing PCIe-bus contention problems. In addition, we designed and implemented an intelligent prefetching algorithm (from CPU memory to GPU) that can achieve the highest processing throughput while sustaining a large min-batch size. For evaluation, we have implemented our memory usage optimization scheme on Tensorflow, the well-known machine learning library from Google, and performed extensive experiments in a multi-GPU testbed. Our evaluation results show that the proposed scheme can increase the mini-batch size by up to 60%, and improve the training throughput by up to 46.6% in a multi-GPU system.
更多
查看译文
关键词
Convolutional neural network, GPGPU memory, Multi-GPU
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要