Dr^2Net: Dynamic Reversible Dual-Residual Networks for Memory-Efficient Finetuning
CVPR 2024(2024)
摘要
Large pretrained models are increasingly crucial in modern computer vision
tasks. These models are typically used in downstream tasks by end-to-end
finetuning, which is highly memory-intensive for tasks with high-resolution
data, e.g., video understanding, small object detection, and point cloud
analysis. In this paper, we propose Dynamic Reversible Dual-Residual Networks,
or Dr^2Net, a novel family of network architectures that acts as a surrogate
network to finetune a pretrained model with substantially reduced memory
consumption. Dr^2Net contains two types of residual connections, one
maintaining the residual structure in the pretrained models, and the other
making the network reversible. Due to its reversibility, intermediate
activations, which can be reconstructed from output, are cleared from memory
during training. We use two coefficients on either type of residual connections
respectively, and introduce a dynamic training strategy that seamlessly
transitions the pretrained model to a reversible network with much higher
numerical precision. We evaluate Dr^2Net on various pretrained models and
various tasks, and show that it can reach comparable performance to
conventional finetuning but with significantly less memory usage.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要