Evaluation of Volta based DGX 1 System Using DNN Workloads

in Proc. Boston area ARChitecture (BARC) Workshop(2019)

引用 0|浏览11
暂无评分
摘要
Multi-GPU systems are being widely used to train deep neural networks (DNNs) as GPUs can significantly reduce the training time. Data parallelism is a popular choice to train DNNs on a multi-GPU system. GPUs in a multi-GPU system repeatedly perform Forward Propagation (FP), Backward Propagation (BP) and, Weight Update (WU) to train a DNN. During the WU stage, GPUs communicate with each other. To improve communication time, different data transfer mechanisms and libraries have been introduced by NVIDIA, and adopted by high-level frameworks to train DNNs. We evaluate the use of two of the most popular communication methods (peer-to-peer (P2P) data transfer and NCCL library-based communication) for training DNNs on a DGX-1 multi-GPU system. We profile and analyze the training of five DNNs using 1, 2, 4 and 8 GPUs. Our analyses provide insights into the software– and hardware–level limiting factors for training DNNs on a multi-GPU system.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要