CU.POKer: Placing DNNs on WSE With Optimal Kernel Sizing and Efficient Protocol Optimization

Bentian Jiang,Jingsong Chen,Jinwei Liu,Lixin Liu,Fangzhou Wang,Xiaopeng Zhang,Evangeline F. Y. Young

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems（2022）

Cited 0|Views35

No score

Abstract

The tremendous growth in deep learning (DL) applications has created an exponential demand for computing power, which leads to the rise of AI-specific hardware. Targeted toward accelerating computation-intensive DL applications, AI hardware, including but not limited to GPGPU, TPU, ASICs, etc., have been adopted ubiquitously. As a result, domain-specific CAD tools play more and more important roles and have been deeply involved in both the design and compilation stages of modern AI hardware. Recently, ISPD 2020 contest introduced a special challenge targeting at the physical mapping of neural network workloads onto the largest commercial DL accelerator, CS-1 wafer-scale engine (WSE). In this article, we proposed CU.POKer, a high-performance engine fully customized for WSE’s deep neural network workload placement challenge. A provably optimal placeable kernel candidate searching scheme and a data-flow-aware placement tool are developed accordingly to ensure the state-of-the-art (SOTA) quality on the real industrial benchmarks. Experimental results on ISPD 2020 contest evaluation suites demonstrated the superiority of our proposed framework over not only the SOTA placer but also the conventional heuristics used in general floorplanning.

Translated text

Key words

AI chip compilation,deep learning (DL) accelerator,neural network workload placement,wafer-scale engine (WSE)

AI Read Science

Must-Reading Tree

Example

Generate MRT to find the research sequence of this paper

Chat Paper

Summary is being generated by the instructions you defined