An Efficient Technique for Large Mini-batch Challenge of DNNs Training on Large Scale Cluster

HPDC '20: The 29th International Symposium on High-Performance Parallel and Distributed Computing Stockholm Sweden June, 2020(2020)

引用 0|浏览7
暂无评分
摘要
Distributed deep learning using large mini-batches is a key strategy to perform the deep learning as fast as possible, but it represents a great challenge as it is difficult to achieve high scaling efficiency when using large clusters without compromising accuracy. The particular problem in this challenge is decreasing the number of model update iterations in whole of training. Thus, we need a technique which can converge the validation accuracy with a small number of iterations to address this challenge. In this paper, we introduce a novel technique, Final Polishing. This technique adjusts the means and variances in the batch normalization and mitigates the difference of normalization between validation datasets and augmented training datasets. By applying the technique, we achieved top-1 validation accuracy of 75.08% with mini-batch size of 81,920, with 2,048 GPUs and completed the training of ResNet-50 in 74.7 seconds.In addition, targeting top-1 validation accuracy of 75.9% or more, we tried additional parameters tuning. Then, we adjusted the number of GPUs and hyperparameters of DNNs with Final Polishing, and we also achieved top-1 validation accuracy of 75.97% with mini-batch size of 86,016, with 3,072 GPUs and completed the training of ResNet-50 in 62.1 seconds.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要