A Reschedulable Dataflow-SIMD Execution for Increased Utilization in CGRA Cross-Domain Acceleration

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2023)

引用 5|浏览54
暂无评分
摘要
When a coarse-grained reconfigurable array (CGRA) architecture shifts toward cross-domain acceleration, control flow and memory accesses often degrade the processing elements (PEs) utilization and array efficiency by breaking the intact dataflow graph (DFG) into regions with mismatched pipelining rate and access–execution stages. In this article, we propose a reschedulable dataflow and SIMD execution, which decouples the DFG with mismatched dataflow into multiple independent subgraphs. We map only one subgraph at a time but with fully unrolling, and reschedule different subgraphs serially in the runtime. Therefore, each subgraph works in its own way without interfering with others. At the same time, an individual subgraph can execute its dataflow in stream for utilization improvement, while unrolled instances composing as SIMD facilitate request coalescing for efficient memory access. With lightweight hardware modification, our design can be integrated in a general CGRA architecture. The experimental results show that our proposal improves the performance and energy efficiency over stream-dataflow CGRA in static-scheduling (Plasticine) by $1.6\times $ and $1.8\times $ , over which in dynamic scheduling (TIA) by $1.5\times $ and $2.7\times $ , and outperforms Plasticine organized in vector-SIMD by $1.2\times $ and $1.4\times $ .
更多
查看译文
关键词
Access execute decoupling,coarse-grained reconfigurable array (CGRA),dataflow decoupling,subgraph scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要