APPQ-CNN: An Adaptive CNNs Inference Accelerator for Synergistically Exploiting Pruning and Quantization Based on FPGA

IEEE Transactions on Sustainable Computing(2024)

引用 0|浏览1
暂无评分
摘要
Convolutional neural networks (CNNs) are widely utilized in intelligent edge computing applications such as computational vision and image processing. However, as the number of layers of the CNN model increases, the number of parameters and computations gets larger, making it increasingly challenging to accelerate in edge computing applications. To effectively adapt to the tradeoff between the speed and accuracy of CNNs inference for smart applications. This paper proposes an FPGA-based adaptive CNNs inference accelerator synergistically utilizing filter pruning, fixed-point parameter quantization, and multi-computing unit parallelism called APPQ-CNN. First, the article devises a hybrid pruning algorithm based on the L1- norm and APoZ to measure the filter impact degree and a configurable parameter quantization fixed-point computing architecture instead of floating-point architecture. Then, design a cascade of the CNN pipelined kernel architecture and configurable multiple computation units. Finally, conduct extensive performance exploration and comparison experiments on various real and synthetic datasets. With negligible accuracy loss, the speed performance of our accelerator APPQ-CNN compares with current state-of-the-art FPGA-based accelerators PipeCNN and OctCNN by 2.15x and 1.91x, respectively. Furthermore, APPQCNN provides settable fixed-point quantization bit-width parameters, filter pruning rate, and multiple computation unit counts to cope with practical application performance requirements in edge computing.
更多
查看译文
关键词
Convolutional neural networks,Fixed-point quantization,FPGA,Inference accelerator,Pipeline,Pruning algorithm
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要