DropIT: Dropping Intermediate Tensors for Memory-Efficient DNN Training

Joya Chen,Kai Xu, Yuhui Wang,Yifei Cheng,Angela Yao

ICLR 2023（2022）

引用 3|浏览15

暂无评分

摘要

A standard hardware bottleneck when training deep neural networks is GPU memory. The bulk of memory is occupied by caching intermediate tensors for gradient computation in the backward pass. We propose a novel method to reduce this footprint - Dropping Intermediate Tensors (DropIT). DropIT drops min-k elements of the intermediate tensors and approximates gradients from the sparsified tensors in the backward pass. Theoretically, DropIT reduces noise on estimated gradients and therefore has a higher rate of convergence than vanilla-SGD. Experiments show that we can drop up to 90% of the intermediate tensor elements in fully-connected and convolutional layers while achieving higher testing accuracy for Visual Transformers and Convolutional Neural Networks on various tasks (e.g. classification, object detection).Our code and models are available at https://github.com/chenjoya/dropit

查看译文

关键词

dropping intermediate tensors,dropping activations,activation compressed training,top-k,vision transformer,cnn

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要