Post-training 4-bit quantization of convolution networks for rapid-deployment
arXiv: Computer Vision and Pattern Recognition(2018)
摘要
Neural network quantization has significant benefits for deployment on dedicated accelerators. We introduce the first practical 4-bit post training quantization approach: it does not involve training the quantized model ("fine-tuning"), nor it requires the availability of the full dataset. Yet, it maintains accuracy that is just a few percents less the state-of-the-art baseline across a wide range of convolutional models. This is unlike traditional approaches that fail entirely in these settings. To achieve this, we convert a full precision pre-trained network to a limited precision network by minimizing the quantization error at the tensor level. We analyze the trade-off between quantization noise and clipping distortion in low precision networks. This enables us to derive approximate analytical expressions for the mean-square-error degradation due to clipping. By optimizing these expressions, we show marked improvements over standard quantization schemes that normally avoid clipping.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要