Cursor-based Adaptive Quantization for Deep Convolutional Neural Network

2021 INTERNATIONAL JOINT CONFERENCE ON NEURAL NETWORKS (IJCNN)(2021)

Cited 0|Views31
No score
Abstract
Recent years have witnessed wide applications of deep convolutional neural network (DCNN) in different scenarios. However, its large computational cost and memory consumption seem to be barriers to computing restrained applications. Model quantization is a common method to reduce the storage and computation burden by decreasing the bit width. In this work, we propose a novel cursor based adaptive quantization method using differentiable architecture search (DAS). The multiple bits' quantization mechanism is formulated as a DAS process with a continuous cursor that represents the quantization bit width. The cursor-based DAS adaptively searches for the desired quantization bit width for each layer. The DAS process can be solved via an alternating approximate optimization process. We further devise a new loss function in the search process to collaboratively optimize accuracy and parameter size of the model. In the quantization step, based on a new strategy, the closest two integers to the cursor are adopted as the bits to quantize the DCNN together to reduce the quantization noise and avoid the local convergence problem. Comprehensive experiments on benchmark datasets show that our cursor based adaptive quantization approach can efficiently obtain lower size model with comparable or even better classification accuracy.
More
Translated text
Key words
Model Compression, Quantization, Deep Neural Network
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined