Pflow: An end-to-end heterogeneous acceleration framework for CNN inference on FPGAs

Yi Wan,Xianzhong Xie, Lingjie Yi, Bo Jiang, Junfan Chen,Yi Jiang

Journal of Systems Architecture(2024)

Cited 0|Views14
No score
Abstract
Field-Programmable Gate Arrays (FPGAs), renowned for their high performance per watt, are extensively utilized to accelerate Convolutional Neural Networks (CNNs) in edge computing environments, primarily employing dataflow-based and instruction set-based approaches. Compared to the instruction set-based approach that features fast and versatile circuit design, the dataflow-based approach can significantly enhance performance at the expense of design versatility. Nevertheless, edge computing environments require both high energy efficiency and adaptability to various scenarios. This paper proposes a novel end-to-end heterogeneous acceleration framework for CNN inference on FPGAs, named Pflow. The basic idea is to decouple network deployment and hardware details with a hardware-software co-design approach. First, a dataflow accelerator with an adaptive scheduling strategy is proposed. The adaptive scheduling strategy, along with a scalable design, maximizes hardware utilization in terms of computing resources and bandwidth. Secondly, we design a novel operator-perception method to automate the processes of network reconstruction and operator fusion. Thirdly, we integrate Pflow into the industrial-grade deep learning framework Paddle-Lite. We evaluate Pflow by implementing several networks on two representative FPGA platforms. Experimental results demonstrate that Pflow achieves energy efficiencies of 46.5 GOPS/W on Xilinx Zynq Ultrascale+ MPSoC 3EG and 59.4 GOPS/W on Virtex UltraScale+ XCVU13P. It also reaches a throughput of up to 255.7 GOPS on the former and 3.686 TOPS on the latter.
More
Translated text
Key words
Heterogeneous computing,Computation graph reconstruction,Acceleration framework,FPGA
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined