A High-Throughput Full-Dataflow MobileNetv2 Accelerator on Edge FPGA

IEEE TRANSACTIONS ON COMPUTER-AIDED DESIGN OF INTEGRATED CIRCUITS AND SYSTEMS（2023）

Cited 2|Views9

No score

Abstract

accelerators for lightweight neural networks, such as MobileNetv2, are of great need in edge computing applications with high throughput requirements. Dataflow architecture has been considered a promising approach to optimize throughput since the intermediate feature map transfers can be significantly saved. However, previous MobileNetv2 accelerators only achieved a partial-dataflow architecture, and just one-third of the feature map transfers can be saved. To solve this issue, we propose a scheme to achieve a full-dataflow MobileNetv2 accelerator on FPGA. The scheme contains four techniques. First, we improve the full-integer quantization for easier deployment on hardware. Second, we propose tunable activation weight imbalance transfer for less quantization accuracy loss. Third, we present several highly optimized accelerator components whose parallelism can be flexibly adjusted and implement residual connection with deeper FIFO so that the requirements of the full-dataflow architecture can be fully met. Finally, we present a computing resource allocation strategy to balance the latency of each layer, and a memory resource allocation strategy to effectively use the on-chip memory. Compared to the state-of-the-art, experimental results show that the accelerator achieves 1910 FPS with 1.8 x speedup when implemented on the Xilinx ZCU102 FPGA. In addition, it reaches 72.98% Top-1 accuracy with 8-bit integer quantization that outperforms all the other MobileNetv2 accelerators.

Translated text

Key words

Convolution neural network (CNN),FPGA accelerator,MobileNetv2

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined