Increasing FPGA Accelerators Memory Bandwidth with a Burst-Friendly Memory Layout

arxiv(2022)

引用 5|浏览5
暂无评分
摘要
Offloading compute-intensive kernels to hardware accelerators relies on the large degree of parallelism offered by these platforms. However, the effective bandwidth of the memory interface often causes a bottleneck, hindering the accelerator's effective performance. Techniques enabling data reuse, such as tiling, lower the pressure on memory traffic but still often leave the accelerators I/O-bound. A further increase in effective bandwidth is possible by using burst rather than element-wise accesses, provided the data is contiguous in memory. In this paper, we propose a memory allocation technique, and provide a proof-of-concept source-to-source compiler pass, that enables such burst transfers by modifying the data layout in external memory. We assess how this technique pushes up the memory throughput, leaving room for exploiting additional parallelism, for a minimal logic overhead.
更多
查看译文
关键词
Burst access,field-programmable gate array (FPGA),high-level synthesis (HLS),memory allocation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要