A hardware-friendly logarithmic quantization method for CNNs and FPGA implementation

Journal of Real-Time Image Processing(2024)

引用 0|浏览15
暂无评分
摘要
Convolutional Neural Networks (CNNs) have been widely used in various fields due to their high accuracy and efficiency. The performance of CNNs is mainly affected by the computing capability, memory bandwidth, and flexibility of embedded devices. The high energy efficiency, computing capability, and reconfigurability of FPGAs make it a good platform for hardware acceleration in the design of CNNs. However, the increase of complexity of CNNs, requires memory while the FPGA on-chip storage is limited. Therefore, we use an improved logarithmic quantization to compress the model. This approach allows for significant reduction in bit widths while maintaining high accuracy levels, making it an effective compression method. In this work, a hardware-friendly quantization scheme is proposed, in which the weights use improved logarithmic quantization scheme, and the quantization scheme of activations use the fixed-point-to-logarithmic. The results show that the quantization model has negligible Top-1/5 accuracy loss without any retraining. In addition, we implement an acceleration engine for a heterogeneous Generalized Matrix Multiplication (GEMM) core on Zynq XC7Z020. In GEMM, the multiplier is replaced by logic shifters and adders, which achieves efficient utilization of LUT resources. We use the optimal quantization model on Zynq XC7Z020. The throughput reaches 69.7 GOPs with a power consumption of 6.008W, and the resource efficiency is 8.713 GOPs/DSP or 5.564 GOPs/kLUTs.
更多
查看译文
关键词
Convolution neural networks,CNN quantization,Hardware accelerator,FPGA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要