Chrome Extension
WeChat Mini Program
Use on ChatGLM

RNA-ViT: Reduced-Dimension Approximate Normalized Attention Vision Transformers for Latency Efficient Private Inference

2023 IEEE/ACM INTERNATIONAL CONFERENCE ON COMPUTER AIDED DESIGN, ICCAD(2023)

Cited 0|Views3
No score
Abstract
The concern over data and model privacy in machine learning inference as a service (MLaaS) has led to the development of private inference (PI) techniques. However, existing PI frameworks, especially those designed for large models such as vision transformers (ViT), suffer from high computational and communication overheads caused by the expensive multi-party computation (MPC) protocols. The encrypted attention module that involves the softmax operation contributes significantly to this overhead. In this work, we present a family of models dubbed RNA-ViT, that leverage a novel attention module called reduced-dimension approximate normalized attention and a latency efficient GeLU-alternative layer. In particular, RNA-ViT uses two novel techniques to improve PI efficiency in ViTs: a reduced-dimension normalized attention (RNA) architecture and a high order polynomial (HOP) softmax approximation for latency efficient normalization. We also propose a novel metric, accuracy-to-latency ratio (A2L), to evaluate modules in terms of their accuracy and PI latency. Based on this metric, we perform an analysis to identify a nonlinearity module with improved PI efficiency. Our extensive experiments show that RNA-ViT can achieve average 3.53x, 3.54x, 1.66x lower PI latency with an average accuracy improvement of 0.93%, 2.04%, and 2.73% compared to the state-of-the-art scheme MPCViT [1], on CIFAR-10, CIFAR-100, and Tiny-ImageNet, respectively.
More
Translated text
Key words
Deep learning,Computer vision,Vision transformer,Private inference,Multi-party computation
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined