PipeFB: An Optimized Pipeline Parallelism Scheme to Reduce the Peak Memory Usage.

ICA3PP(2022)

引用 0|浏览4
暂无评分
摘要
Neural network models is developing toward deeper and wider to obtain higher accuracy and robustness. However, the limited physical memory capacity of existing hardware devices limits the scale of the neural network that can be trained, and the limited computing capacity resulting in excessively long training time. Therefore, the distributed parallelism scheme based on multi-accelerator machines becomes an effective method to training large-scale neural networks. The pipeline parallelism is one of the distributed parallelism scheme, which has large advantages in the training speed. But it also significantly increases the peak memory usage and communication overhead, because it needs to store multiply versions of activations. Our previous work has proposed a data transfer mechanism and applied it to the PipeDream design (a mature pipeline parallelism scheme), which offloads activations in the pipeline to other memory devices, such as the CPU memory. The data transfer mechanism greatly reduces the peak memory usage of the PipeDream, but it brings a large amount of communication, which makes the PipeDream lost a lot of training speed. This paper proposes an optimized pipeline parallelism scheme, the PipeFB, for applying the data transfer mechanism. The PipeFB deploys the forward propagation and backward propagation of the neural network on different computing nodes, which is different from the traditional pipeline parallelism scheme. We implements the PipeFB and applies the data transfer mechanism to it. The experimental results shows that our design has the same peak memory usage as the PipeDream with the data transfer mechanism, but the training speed of our design is 1.48 to 2.27 times faster.
更多
查看译文
关键词
optimized pipeline parallelism scheme,memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要