A Deep Learning Pipeline Parallel Optimization Method

Tiantian Lv,Lu Wu,Zhigang Zhao,Chunxiao Wang, Chuantao Li

2023 IEEE/ACM 23rd International Symposium on Cluster, Cloud and Internet Computing (CCGrid)（2023）

引用 0|浏览1

暂无评分

摘要

In recent years, with the continuous development of artificial intelligence, deep learning algorithms are becoming more and more complex, and the scale of model training is also growing. The artificial intelligence platform also involves large-scale model training in our computing network operating system project. However, with the increasing size of data sets and models, the traditional single-card training makes the training speed very slow, and the training accuracy needs to converge, which has yet to meet people's computational needs. This has led to the development of GPipe, PipeDream, and other famous pipelines. In this paper, an efficient pipeline parallel training optimization method is proposed. In our approach, multiple computing nodes process small batches of data in parallel in a pipeline manner. We have mainly done the following two aspects of work: First, we designed a weight buffer strategy to limit the number of weight versions generated and ensure the model's accuracy. And we also developed a tensor compression mechanism to improve the transmission rate. Secondly, we propose a prefix sum partition algorithm to ensure that the pipeline can achieve balanced partitioning and save the memory of computing resources. Compared with several popular pipeline parallel frameworks, the proposed method can achieve about twice the training acceleration and save about 30% - 40% of the memory usage.

查看译文

关键词

deep learning, pipeline parallel, weight buffer, balancing partition

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要