Efficient Backpropagation with Variance-Controlled Adaptive Sampling

ICLR 2024(2024)

引用 0|浏览7
暂无评分
摘要
Sampling-based algorithms, which eliminate ”unimportant” computations during forward and/or back propagation (BP), offer potential solutions to accelerate neural network training. However, since sampling introduces approximations to training, such algorithms may not consistently maintain accuracy across various tasks. In this work, we introduce a variance-controlled adaptive sampling (VCAS) method designed to accelerate BP. VCAS computes an unbiased stochastic gradient with fine-grained layerwise importance sampling in data dimension for activation gradient calculation and leverage score sampling in token dimension for weight gradient calculation. To preserve accuracy, we control the additional variance by learning the sample ratio jointly with model parameters during training. We assessed VCAS on multiple fine-tuning and pre-training tasks in both vision and natural language domains. On all the tasks, VCAS can preserve the original training loss trajectory and validation accuracy with an up to 73.87 of the whole training process. The implementation is available at https://github.com/thu-ml/VCAS .
更多
查看译文
关键词
efficient training algorithms,stochastic gradient descent,importance sampling,variance reduction
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要