MadPipe: Memory Aware Dynamic Programming Algorithm for Pipelined Model Parallelism

Olivier Beaumont,Lionel Eyraud-Dubois,Alena Shilova

2022 IEEE 36TH INTERNATIONAL PARALLEL AND DISTRIBUTED PROCESSING SYMPOSIUM WORKSHOPS (IPDPSW 2022)（2022）

引用 0|浏览9

暂无评分

摘要

The training phase in Deep Neural Networks (DNNs) is very computationally intensive and is nowadays often performed on parallel computing platforms, ranging from a few GPUs to several thousand GPUs. The strategy of choice for the parallelization of training is the so-called data parallel approach, based on the parallel training of the different inputs (typically images) and the aggregation of network weights with collective communications (AllReduce operation). The scalability of this approach is limited both by the memory available on each node and the networking capacities for collective operations. Recently, a parallel model approach has been proposed (PipeDream, Gpipe), in which the DNN weights are distributed and images are trained in a pipeline/stream manner over the computational nodes. In this paper, we formalize in detail the optimization problem associated with the placement of DNN layers onto computation resources when using pipelined model parallelism, and we derive a dynamic programming based heuristic, MadPipe. We show through extensive simulations based on realistic networks that MadPipe significantly improves the performance of the pipelined parallel model approach compared to PipeDream.

查看译文

关键词

deep neural networks,parallel computing platforms,thousand GPUs,data parallel approach,parallel training,network weights,collective communications,AllReduce operation,networking capacities,collective operations,DNN weights,computational nodes,computation resources,MadPipe,realistic networks,pipelined parallel model approach,memory aware dynamic programming algorithm,DNNs

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要