Efficient Backpropagation with Variance-Controlled Adaptive Sampling
ICLR 2024(2024)
摘要
Sampling-based algorithms, which eliminate ”unimportant” computations
during forward and/or back propagation (BP), offer potential solutions to
accelerate neural network training. However, since sampling introduces
approximations to training, such algorithms may not consistently maintain
accuracy across various tasks. In this work, we introduce a variance-controlled
adaptive sampling (VCAS) method designed to accelerate BP. VCAS computes an
unbiased stochastic gradient with fine-grained layerwise importance sampling in
data dimension for activation gradient calculation and leverage score sampling
in token dimension for weight gradient calculation. To preserve accuracy, we
control the additional variance by learning the sample ratio jointly with model
parameters during training. We assessed VCAS on multiple fine-tuning and
pre-training tasks in both vision and natural language domains. On all the
tasks, VCAS can preserve the original training loss trajectory and validation
accuracy with an up to 73.87
of the whole training process. The implementation is available at
https://github.com/thu-ml/VCAS .
更多查看译文
关键词
efficient training algorithms,stochastic gradient descent,importance sampling,variance reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要