DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

ICLR 2023(2022)

引用 3|浏览15
暂无评分
摘要
A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g. classification, object detection).Our code and models are available at https://github.com/chenjoya/dropit
更多
查看译文
关键词
dropping intermediate tensors,dropping activations,activation compressed training,top-k,vision transformer,cnn
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要