FLSGD: free local SGD with parallel synchronization

The Journal of Supercomputing(2022)

引用 3|浏览13
暂无评分
摘要
Synchronous parameters algorithms with data parallelism have been successfully utilized to accelerate the distributed training of deep neural networks (DNNs). However, a prevalent shortcoming of the synchronous methods is computation waste resulted from the mutual waiting among the computational workers with different performance and the communication delays at each synchronization. To alleviate this drawback, we propose a novel method, free local stochastic gradient descent (FLSGD) with parallel synchronization, to eliminate the waiting and communication overhead. Specifically, the process of distributed DNN training is firstly modeled as a pipeline which assembly consists of three components: dataset partition, local SGD, and parameter updating. Then, a novel adaptive batch size and dataset partition method based on the computational performance of the node is employed to eliminate the waiting time by keeping the load balance of the distributed DNN training. The local SGD and the parameter updating including gradients synchronization are parallelized to eliminate the communication cost by one-step gradient delaying, and the stale problem is remedied by an appropriate approximation. To our best knowledge, this is the first work focusing on decreasing both distributed training load balancing and communication overhead Extensive experiments are conducted with four state-of-the-art DNN models on two image classification datasets (i.e., CIFAR10 and CIFAR100) to demonstrate that the effectiveness of FLSGD outperforms the synchronous methods.
更多
查看译文
关键词
Distributed deep learning,Parallel computing,Local SGD,Stale SGD,Parameter communication
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要