PETRA: Parallel End-to-end Training with Reversible Architectures
arxiv(2024)
摘要
Reversible architectures have been shown to be capable of performing on par
with their non-reversible architectures, being applied in deep learning for
memory savings and generative modeling. In this work, we show how reversible
architectures can solve challenges in parallelizing deep model training. We
introduce PETRA, a novel alternative to backpropagation for parallelizing
gradient computations. PETRA facilitates effective model parallelism by
enabling stages (i.e., a set of layers) to compute independently on different
devices, while only needing to communicate activations and gradients between
each other. By decoupling the forward and backward passes and keeping a single
updated version of the parameters, the need for weight stashing is also
removed. We develop a custom autograd-like training framework for PETRA, and we
demonstrate its effectiveness on CIFAR-10, ImageNet32, and ImageNet, achieving
competitive accuracies comparable to backpropagation using ResNet-18,
ResNet-34, and ResNet-50 models.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要