FPGA-based Accelerators System with Low Latency Autonomous DMA Engine

Tomoya Yokono, Yoshiro Yamabe,Kenji Tanaka,Yuki Arikawa, Teruaki Ishizaki

2022 IEEE 30th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2022)

引用 0|浏览0
暂无评分
摘要
Recently, computing systems have comprised specialized computing components such as FPGAs, GPUs, and various ASIC accelerators to enhance efficiency and performance. We proposed a queue structure mechanism to communicate between CPUs-FPGAs and offload tasks onto FPGAs asynchronously. This paper presents an FPGA system with a low latency autonomous DMA engine to enhance efficiency and performance. We build the system installed in eight FPGAs in which a customized DMA engine is implemented and evaluate communication performance including Software Stack for a single FPGA and the communication latency of an FPGA chain. In a single FPGA, our system achieves DMA read bandwidth of up to 68.5% and DMA write bandwidth of up to 62.2% for PCIe Gen3 x16 theoretical performance. An FPGA chain of up to 8 FPGAs in 4MB data size has latency of 3.7 milliseconds, which under half that when using the existing DMA method(7.6 milliseconds).
更多
查看译文
关键词
PCIe Gen3 x16 theoretical performance,FPGA chain,low latency autonomous DMA engine,computing systems,ASIC accelerators,queue structure mechanism,CPUs-FPGAs,customized DMA engine,communication performance evaluation,specialized computing components,FPGA-based accelerator system,software stack,communication latency,DMA read bandwidth,DMA write bandwidth,time 3.7 ms,time 7.6 ms,storage capacity 4 Mbit
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要