Why Does Decentralized Training Outperform Synchronous Training In The Large Batch Setting

user-5f8411ab4c775e9685ff56d3(2021)

引用 0|浏览24
暂无评分
摘要
Distributed Deep Learning (DDL) is essential for large-scale Deep Learning (DL) training. Using a sufficiently large batch size is critical to achieving DDL runtime speedup. In a large batch setting, the learning rate must be increased to compensate for the reduced number of parameter updates. However, a large batch size may converge to sharp minima with poor generalization, and a large learning rate may harm convergence. Synchronous Stochastic Gradient Descent (SSGD) is the de facto DDL optimization method. Recently, Decentralized Parallel SGD (DPSGD) has been proven to achieve a similar convergence rate as SGD and to guarantee linear speedup for non-convex optimization problems. While there was anecdotal evidence that DPSGD outperforms SSGD in the large-batch setting, no systematic study has been conducted to explain why this is the case. Based on a detailed analysis of the DPSGD learning dynamics, we find that DPSGD introduces additional landscape-dependent noise, which has two benefits in the large-batch setting: 1) it automatically adjusts the learning rate to improve convergence; 2) it enhances weight space search by escaping local traps (e.g., saddle points) to find flat minima with better generalization. We conduct extensive studies over 12 state-of-the-art DL models/tasks and demonstrate that DPSGD consistently outperforms SSGD in the large batch setting; and DPSGD converges in cases where SSGD diverges for large learning rates. Our findings are consistent across different application domains, Computer Vision and Automatic Speech Recognition, and different neural network models, Convolutional Neural Networks and Long Short-Term Memory Recurrent Neural Networks.
更多
查看译文
关键词
Artificial neural network,Deep learning,Recurrent neural network,Stochastic gradient descent,Speedup,Convolutional neural network,Rate of convergence,Optimization problem,Mathematical optimization,Computer science,Artificial intelligence
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要