Accelerating distributed machine learning with model compression and graph partition.

J. Parallel Distributed Comput.(2023)

引用 0|浏览9
暂无评分
摘要
The rapid growth of data and parameter sizes of machine learning models makes it necessary to improve the efficiency of distributed training. It is observed that the communication cost usually is the bottleneck of distributed training systems. In this paper, we focus on the parameter server framework which is a widely deployed distributed learning framework. The frequent parameter pull, push, and synchronization among multiple machines leads to a huge communication volume. We aim to reduce the communication cost for the parameter server framework. Compressing the training model and optimizing the data and parameter allocation are two existing approaches to reducing communication costs. We jointly consider these two approaches and propose to optimize the data and parameter allocation after compression. Different from previous allocation schemes, the data sparsity property may no longer hold after compression. It brings additional opportunities and challenges for the allocation problem. We also consider the allocation problem for both linear and deep neural network (DNN) models. Fixed and dynamic partition algorithms are proposed accordingly. Experiments on real-world datasets show that our joint compression and partition scheme can efficiently reduce communication overhead for linear and DNN models.(c) 2023 Published by Elsevier Inc.
更多
查看译文
关键词
Data sparsity,Distributed machine learning,Graph partition,Parameter server framework
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要