Quantization for Bayesian Deep Learning: Low-Precision Characterization and Robustness

Jun-Liang Lin,Ranganath Krishnan, Keyur Ruganathbhai Ranipa,Mahesh Subedar, Vrushabh Sanghavi,Meena Arunachalam,Omesh Tickoo,Ravishankar Iyer,Mahmut Taylan Kandemir

2023 IEEE INTERNATIONAL SYMPOSIUM ON WORKLOAD CHARACTERIZATION, IISWC(2023)

引用 0|浏览3
暂无评分
摘要
Bayesian Deep Learning is an emerging field for building robust and trustworthy AI systems due to its ability to estimate reliable uncertainty in neural networks. The need for modeling distribution over parameters and multiple Monte Carlo forward runs in Bayesian neural networks leads to larger model size and significant increase in inference latency compared to deterministic models, which poses challenges for practical deployment. Quantization is a technique that can reduce the model size and also speed up the inference through low-precision computation. In this work, we propose and evaluate a quantization framework and workflow for Bayesian deep learning workloads, which leverages 8-bit integer (INT8) operations to accelerate inference on the 4th Gen Intel Xeon scalable processor (formerly codenamed Sapphire Rapids). We demonstrate that our quantization workflow achieves 6.9x inference throughput speedup on the ImageNet benchmark without sacrificing the model accuracy and quality of uncertainty. Furthermore, we evaluate the effects of quantization on Bayesian neural networks w.r.t. generalizability, robustness against data drift, and its capability in uncertainty estimation on large-scale datasets including a real-world safety-critical application. Our code has been integrated into an open-source project and made available on GitHub at the following URL: https://github.com/IntelLabs/bayesian-torch.
更多
查看译文
关键词
quantization,bayesian deep learning,deep learning,low-precision
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要