Q-gym: An Equality Saturation Framework for DNN Inference ExploitingWeight Repetition

PACT(2022)

引用 0|浏览79
暂无评分
摘要
The high computation cost is one of the key bottlenecks for adopting deep neural networks (DNNs) in different hardware. When client data are sensitive, privacy-preserving DNN evaluation method, such as homomorphic encryptions (HE), shows even more computation cost. Prior works employed weight repetition in quantized neural networks to save the computation of convolutions by memorizing or arithmetic factorization. However, such methods fail to fully exploit the exponential search space from factorizing and reusing computation. We propose Q-gym, a DNN framework consisting of two components. First, we propose a compiler, which leverages equality saturation to generate computation expressions for convolutional layers with a significant reduction in the number of operations. Second, we integrate the computation expressions with various parallelization methods to accelerate DNN inference on different hardware. We also employ the efficient expressions to accelerate DNN inference under HE. Extensive experiments show that Q-gym achieves 19.1% / 68.9% more operation reductions compared to SumMerge and original DNNs. Also, computation expressions from Q-gym contribute to 2.56x / 1.78x inference speedup on CPU / GPU compared to OneDNN and PyTorch GPU on average. For DNN evaluation under HE, Q-gym reduces the homomorphic operations by 2.47x / 1.30x relative to CryptoNet and FastCryptoNet for HE tasks with only 0.06% accuracy loss due to quantization.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要