CAQ: Context-Aware Quantization via Reinforcement Learning

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2021)

引用 1|浏览20
暂无评分
摘要
Model quantization is a crucial step for porting Deep Neural Networks (DNNs) on embedded devices to meet the limited computation and storage resources requirement. Traditional methods usually obtain the scaling factor and quantize the weights based on the information of single layer. However, our analysis indicate that these selection methods of scaling factor overlook the differences and dependencies among layers, leading to large truncation errors or zeroing errors, which is the main reason for the performance degradation. To this end, we propose a Context-Aware Quantization (CAQ) scheme, which formalizes the model quantization as a global optimization problem and leverages reinforcement learning to search for the optimal scaling factors based on the entire model. Further, we adopt shift-based scaling factors to narrow the search space to improve the search efficiency, additionally, it reduces the computational complexity during the inference phase, and also provides a simpler and more robust activation calibration solution. We extensively test our scheme on a wide range of Neural Networks, including ResNet 50/101/152, InceptionV3 and MobileNetV2 on ImageNet, the entire search process only takes about 1 hour on a single GeForce RTX 2080 Ti. Compared with the existed methods, Our scheme can get a better performance, which could maintain the post-quantization accuracy loss less than 0.25%, while reducing memory footprint by 5%-8% and multiply accumulate (MAC) operations by 2%-4%. Besides, we further show that the CAQ can be applied on other tasks, such as object detection and segmentation.
更多
查看译文
关键词
Quantization, Context-Aware, Reinforcement Learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要