A Novel Stochastic Gradient Descent Algorithm Based On Grouping Over Heterogeneous Cluster Systems For Distributed Deep Learning

Wenbin Jiang,Geyan Ye,Laurence T. Yang,Jian Zhu,Yang Ma,Xia Xie,Hai Jin

2019 19TH IEEE/ACM INTERNATIONAL SYMPOSIUM ON CLUSTER, CLOUD AND GRID COMPUTING (CCGRID)（2019）

引用 10|浏览31

暂无评分

摘要

On heterogeneous cluster systems, the convergence performances of neural network models are greatly troubled by the different performances of machines. In this paper, we propose a novel distributed Stochastic Gradient Descent (SGD) algorithm named Grouping-SGD for distributed deep learning, which converges faster than Sync-SGD, Async-SGD, and StaleSGD. In Grouping-SGD, machines are partitioned into multiple groups, ensuring that machines in the same group have similar performances. Machines in the same group update the models synchronously, while different groups update the models asynchronously. To improve the performance of Grouping-SGD further, the parameter servers are arranged from fast to slow, and they are responsible for updating the model parameters from the lower layer to the higher layer respectively. The experimental results indicate that Grouping-SGD can achieve 1.2 similar to 3.7 times speedups using popular image classification benchmarks: MNIST, Cifar10, Cifar1.00, and ImageNet, compared to SyncSGD, Async-SGD, and Stale-SGD.

查看译文

关键词

Deep Learning, Distributed SGD Algorithms, Parameter Servers, Heterogeneous Cluster Systems

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要