Improving Extreme Low-Bit Quantization With Soft Threshold

IEEE Transactions on Circuits and Systems for Video Technology(2023)

Cited 3|Views32
No score
Abstract
Deep neural networks executing with low precision at inference time can gain acceleration and compression advantages over their high-precision counterparts, but need to overcome the challenge of accuracy degeneration as the bit-width decreases. This work focuses on under 4-bit quantization that has a significant accuracy degeneration. We start with ternarization, a balance between efficiency and accuracy that quantizes both weights and activations into ternary values. We find that the hard threshold $\Delta $ introduced in previous ternary networks for determining quantization intervals and the suboptimal solution of $\Delta $ limit the performance of the ternary model. To alleviate it, we present Soft Threshold Ternary Networks (STTN), which enables the model to automatically determine ternarized values instead of depending on a hard threshold. Based on it, we further generalize the idea of soft threshold from ternarization to arbitrary bit-width, named Soft Threshold Quantized Networks (STQN). We observe that previous quantization relies on the rounding-to-nearest function, constraining the quantization solution space and leading to a significant accuracy degradation, especially in low-bit ( $\leq3$ -bits) quantization. Instead of relying on the traditional rounding-to-nearest function, STQN is able to determine quantization intervals by itself adaptively. Accuracy experiments on image classification, object detection and instance segmentation, as well as efficiency experiments on field-programmable gate array (FPGA) demonstrate that the proposed framework can achieve a prominent tradeoff between accuracy and efficiency. Code is available at: https://github.com/WeixiangXu/STTN .
More
Translated text
Key words
Convolutional neural network,network compression,low-bit quantization,ternary quantization
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined