Alleviating Transfer Latency in DataFlow Accelerator for DSP Applications

2023 IEEE 41ST INTERNATIONAL CONFERENCE ON COMPUTER DESIGN, ICCD(2023)

Cited 0|Views7
No score
Abstract
Towards multiple domains, dataflow accelerators show superiority for their flexible programmability and high efficiency. This efficiency relies highly on data communication between processing elements (PEs), which is sensitive to PE location, array scale and workload size. Laying out instructions as a dataflow graph on the PE array creates more instruction-level parallelism. However, the farther distance between remote PEs and memory banks introduces extra transfer latency, bringing performance degradation to high real-time applications. This paper examines the workloads of digital signal processing across different data scales and classifies latency problems related to data transfers and kernel switching. Specifically, we propose a novel forwarding network on chip to alleviate transfer latency and improve multi-destination sharing in the dataflow execution. Moreover, we devise bandwidth reusing mechanism to speedup kernel switching. The experiment results show that our scalable design achieves up to 2.19x (1.45x on average) speedup while reducing switching overhead by 9.85x, with an area overhead of 10.82% over the conventional dataflow accelerator.
More
Translated text
Key words
dataflow accelerator,DSP,network on chip,transfer latency,data reusability
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined