A Deep Learning Inference Accelerator Based on Model Compression on FPGA.

FPGA(2019)

引用 4|浏览16
暂无评分
摘要
Convolutional neural networks (CNN) have demonstrated state-of-the-art accuracy in image classification and object detection owing to the increase in data and computation capacity of hardware. However, this state-of-the-art achievement depends heavily on the DSP floating-point computing capability of the device, which increases the power dissipation and cost of the device. In order to solve the problem, we made the first attempt to implement a CNN computing accelerator based on shift operation on FPGA. In this accelerator, an efficient Incremental Network Quantization (INQ) method was applied to compress the CNN model from full precision to 4-bit integer, which represents values of either zero or power of two. Then the multiply and accumulate (MAC) operations for convolution layer and fully-connected layer was converted to shift and accumulation (SAC) operations, and SAC could be easily implemented by the logic elements of FPGA. Consequently, parallelism of CNN inference process can be further expanded. For the SqueezeNet model, single image processing latency was 0.673ms on Intel Arria 10 FPGA (Inspur F10A board) showing a slightly better result than on NVIDIA Tesla P4, and the compute capacity of FPGA increased by 1.77 times at least.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要