Chrome Extension
WeChat Mini Program
Use on ChatGLM

Making Models Shallow Again: Jointly Learning to Reduce Non-Linearity and Depth for Latency-Efficient Private Inference

CoRR(2023)

Cited 6|Views6
No score
Abstract
Large number of ReLU and MAC operations of Deep neural networks make them ill-suited for latency and compute-efficient private inference. In this paper, we present a model optimization method that allows a model to learn to be shallow. In particular, we leverage the ReLU sensitivity of a convolutional block to remove a ReLU layer and merge its succeeding and preceding convolution layers to a shallow block. Unlike existing ReLU reduction methods, our joint reduction method can yield models with improved reduction of both ReLUs and linear operations by up to 1.73× and 1.47×, respectively, evaluated with ResNet18 on CIFAR-100 without any significant accuracy-drop.
More
Translated text
Key words
models shallow,inference,learning,non-linearity,latency-efficient
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined