Chrome Extension
WeChat Mini Program
Use on ChatGLM

StreamliNet: Cost-aware Layer-wise Neural Network Linearization for Fast and Accurate Private Inference

Information Sciences(2024)

Cited 0|Views10
No score
Abstract
Private inference (PI) allows a client and a server to perform cryptographically-secure deep neural network inference without disclosing their sensitive data to each other. Despite the strong security guarantee, existing models are ill-suited for PI since their overused non-linear operations such as ReLUs are computationally expensive in the regime of ciphertext and therefore dominate the PI latency. Previous solutions on ReLU optimization either ignore the intrinsic importance of ReLU or suffer from significant accuracy loss. In this work, we propose StreamliNet, an importance-driven gradient-based framework to speed up PI latency and retain inference accuracy. Specifically, we first present a novel notion of ReLU negativity as a proxy for the ReLU importance in a multivariate metric to precisely identify layer-wise budgets. Then, our StreamliNet automates the selection of performance-insensitive ReLUs for linearization and learns the non-linearity sparse model where ReLUs are present in each layer with appropriate counts and locations. Moreover, in order to reduce the activation map discrepancy, we develop a cost-aware post-activation consistency constraint to prioritize the linearization of ReLUs with low cost while further mitigating the model performance degradation. Extensive experiments on various models and datasets demonstrate that StreamliNet outperforms the state-of-the-art such as SNL (ICML 22) and SENet (ICLR 23) by boosting 3.09% more accuracy with iso-ReLU budget or requiring 2× fewer ReLUs with iso-accuracy, on CIFAR-100.
More
Translated text
Key words
Machine learning as a service,Private inference,Network linearization,ReLU optimization,Gradient descent
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined