Fela: Incorporating Flexible Parallelism And Elastic Tuning To Accelerate Large-Scale Dml

2020 IEEE 36TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2020)(2020)

引用 5|浏览231
暂无评分
摘要
Distributed machine learning (DML) has become the common practice in industry, because of the explosive volume of training data and the growing complexity of training model. Traditional DML follows data parallelism but causes significant communication cost, due to the huge amount of parameter transmission. The recently emerging model-parallel solutions can reduce the communication workload, but leads to load imbalance and serious straggler problems. More importantly, the existing solutions, either data-parallel or model-parallel, ignore the nature of flexible parallelism for most DML tasks, thus failing to fully exploit the GPU computation power. Targeting at these existing drawbacks, we propose Fela, which incorporates both flexible parallelism and elastic tuning mechanism to accelerate DML. In order to fully leverage GPU power and reduce communication cost, Fela adopts hybrid parallelism and uses flexible parallel degrees to train different parts of the model. Meanwhile, Fela designs token-based scheduling policy to elastically tune the workload among different workers, thus mitigating the straggler effect and achieve better load balance. Our comparative experiments show that Fela can significantly improve the training throughput and outperforms the three main baselines (i.e. data-parallel, model-parallel, and hybrid-parallel) by up to 3.23x, 12.22x, and 1.85x respectively.
更多
查看译文
关键词
Distributed machine learning, flexible parallelism, oken-based scheduling, straggler effect
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要