Advances of Pipeline Model Parallelism for Deep Learning Training: An Overview

Lei Guan,Dong-Sheng Li, Ji-Ye Liang,Wen-Jian Wang,Ke-Shi Ge,Xi-Cheng Lu

Journal of Computer Science and Technology（2024）

引用 0|浏览2

暂无评分

摘要

Deep learning has become the cornerstone of artificial intelligence, playing an increasingly important role in human production and lifestyle. However, as the complexity of problem-solving increases, deep learning models become increasingly intricate, resulting in a proliferation of large language models with an astonishing number of parameters. Pipeline model parallelism (PMP) has emerged as one of the mainstream approaches to addressing the significant challenge of training “big models”. This paper presents a comprehensive review of PMP. It covers the basic concepts and main challenges of PMP. It also comprehensively compares synchronous and asynchronous pipeline schedules for PMP approaches, and discusses the main techniques to achieve load balance for both intra-node and inter-node training. Furthermore, the main techniques to optimize computation, storage, and communication are presented, with potential research directions being discussed.

查看译文

关键词

deep learning,pipeline schedule,load balance,multi-GPU system,pipeline model parallelism (PMP)

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要