Pse: mixed quantization framework of neural networks for efficient deployment

Yingqing Yang,Guanzhong Tian, Mingyuan Liu, Yihao Chen,Jun Chen,Yong Liu,Yu Pan,Longhua Ma

Journal of Real-Time Image Processing(2023)

引用 0|浏览1
暂无评分
摘要
Quantizing is a promising approach to facilitate deploying deep neural networks on resource-limited devices. However, existing methods are challenged by obtaining computation acceleration and parameter compression while maintaining excellent performance. To achieve this goal, we propose PSE, a mixed quantization framework which combines product quantization (PQ), scalar quantization (SQ), and error correction. Specifically, we first employ PQ to obtain the floating-point codebook and index matrix of the weight matrix. Then, we use SQ to quantize the codebook into integers and reconstruct an integer weight matrix. Finally, we propose an error correction algorithm to update the quantized codebook and minimize the quantization error. We extensively evaluate our proposed method on various backbones, including VGG-16, ResNet-18/50, MobileNetV2, ShuffleNetV2, EfficientNet-B3/B7, and DenseNet-201 on CIFAR-10 and ILSVRC-2012 benchmarks. The experiments demonstrate that PSE reduces computation complexity and model size with acceptable accuracy loss. For example, ResNet-18 achieves 1.8 × acceleration ratio and 30.4 × compression ratio with less than 1.54
更多
查看译文
关键词
Neural networks,Quantization,Compression,Acceleration,Data-free
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要