DyBit: Dynamic Bit-Precision Numbers for Efficient Quantized Neural Network Inference

Jiajun Zhou,Jiajun Wu,Yizhao Gao,Yuhao Ding,Chaofan Tao,Boyu Li,Fengbin Tu,Kwang-Ting Cheng,Hayden Kwok-Hay So,Ngai Wong

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems（2023）

引用 1|浏览25

暂无评分

摘要

To accelerate the inference of deep neural networks (DNNs), quantization with low-bitwidth numbers is actively researched. A prominent challenge is to quantize the DNN models into low-bitwidth numbers without significant accuracy degradation, especially at very low bitwidths (< 8 bits). This work targets an adaptive data representation with variable-length encoding called DyBit. DyBit can dynamically adjust the precision and range of separate bit-field to be adapted to the DNN weights/activations distribution. We also propose a hardware-aware quantization framework with a mixed-precision accelerator to trade-off the inference accuracy and speedup. Experimental results demonstrate that the inference accuracy via DyBit is 1.997 state-of-the-art at 4-bit quantization, and the proposed framework can achieve up to 8.1x speedup compared with the original model.

查看译文

关键词

Deep Neural Networks,Quantization,Accelerator,FPGAs,Machine Learning

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要