DPU-v2: Energy-efficient execution of irregular directed acyclic graphs

2022 55th IEEE/ACM International Symposium on Microarchitecture (MICRO)(2022)

引用 3|浏览29
暂无评分
摘要
A growing number of applications like probabilistic machine learning, sparse linear algebra, robotic navigation, etc., exhibit irregular data flow computation that can be modeled with directed acyclic graphs (DAGs). The irregularity arises from the seemingly random connections of nodes, which makes the DAG structure unsuitable for vectorization on CPU or GPU. Moreover, the nodes usually represent a small number of arithmetic operations that cannot amortize the overhead of launching tasks/kernels for each node, further posing challenges for parallel execution. To enable energy-efficient execution, this work proposes DAG processing unit (DPU) version 2, a specialized processor architecture optimized for irregular DAGs with static connectivity. It consists of a tree-structured datapath for efficient data reuse, a customized banked register file, and interconnects tuned to support irregular register accesses. DPU-v2 is utilized effectively through a targeted compiler that systematically maps operations to the datapath, minimizes register bank conflicts, and avoids pipeline hazards. Finally, a design space exploration identifies the optimal architecture configuration that minimizes the energy-delay product. This hardware-software co-optimization approach results in a speedup of $1.4 \times, 3.5 \times $, and $14 \times $ over a state-of-the-art DAG processor ASIP, a CPU, and a GPU, respectively, while also achieving a lower energy-delay product. In this way, this work takes an important step towards enabling an embedded execution of emerging DAG workloads.
更多
查看译文
关键词
Graphs,irregular computation graphs,parallel processor,hardware-software codesign,DAG processing unit,probabilistic circuits,sparse matrix triangular solve,spatial datapath,interconnection network,bank conflicts,design space exploration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要