Contraction of Dynamically Masked Deep Neural Networks for Efficient Video Processing

IEEE Transactions on Circuits and Systems for Video Technology(2022)

引用 3|浏览13
暂无评分
摘要
Sequential data such as video are characterized by spatio-temporal redundancies. As of yet, few deep learning algorithms exploit them to decrease the often massive cost during inference. This work leverages correlations in video data to reduce the size and run-time cost of deep neural networks. Drawing upon the simplicity of the typically used ReLU activation function, we replace this function by dynamically updating masks. The resulting network is a simple chain of matrix multiplications and bias additions, which can be contracted into a single weight matrix and bias vector. Inference then reduces to an affine transformation of the input sample with these contracted parameters. We show that the method is akin to approximating the neural network with a first-order Taylor expansion around a dynamically updating reference point. For triggering these updates, one static and three data-driven mechanisms are analyzed. We evaluate the proposed algorithm on a range of tasks, including pose estimation on surveillance data, road detection on KITTI driving scenes, object detection on ImageNet videos, as well as denoising MNIST digits, and obtain compression rates up to $3.6\times $ .
更多
查看译文
关键词
Deep neural networks,network compression,Taylor approximation,masking
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要