Design space exploration for layer-parallel execution of convolutional neural networks on CGRAs

SCOPES '20: 23rd International Workshop on Software and Compilers for Embedded Systems St. Goar Germany May, 2020(2020)

引用 4|浏览31
暂无评分
摘要
In this work, we systematically explore the design space of throughput, energy, and hardware costs for layer-parallel mappings of Convolutional Neural Networks (CNNs) onto coarse-grained reconfigurable arrays (CGRAs). We derive an analytical model that computes the required resources (processing elements) and buffer memory and thus hardware cost C to sustain a given throughput T as well as the resulting overall energy consumption E for inference. Further, we propose an efficient design space exploration (DSE) to determine the fronts of Pareto-optimal (T,E,C) solutions. This exploration helps to determine the limits of scalability of the presented tiled CGRA accelerator architectures in terms of throughput, the number of parallel layers that can be simultaneously processed, and memory requirements. Finally, we provide an evaluation of energy savings achievable on our architecture in comparison to implementations that execute sequentially a CNN layer-by-layer. In experiments, it is shown that layer-parallel processing is able to reduce energy consumption E by 3.6X, hardware cost C by 1.2X, and increase the achievable throughput T by 6.2X for MobileNet.
更多
查看译文
关键词
CNN Accelerators, Design Space Exploration, CGRA
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要