Shortcut Mining: Exploiting Cross-Layer Shortcut Reuse in DCNN Accelerators

2019 IEEE International Symposium on High Performance Computer Architecture (HPCA)(2019)

Cited 34|Views14
No score
Abstract
Off-chip memory traffic has been a major performance bottleneck in deep learning accelerators. While reusing on-chip data is a promising way to reduce off-chip traffic, the opportunity on reusing shortcut connection data in deep networks (e.g., residual networks) have been largely neglected. Those shortcut data accounts for nearly 40% of the total feature map data. In this paper, we propose Shortcut Mining, a novel approach that "mines" the unexploited opportunity of on-chip data reusing. We introduce the abstraction of logical buffers to address the lack of flexibility in existing buffer architecture, and then propose a sequence of procedures which, collectively, can effectively reuse both shortcut and non-shortcut feature maps. The proposed procedures are also able to reuse shortcut data across any number of intermediate layers without using additional buffer resources. Experiment results from prototyping on FPGAs show that, the proposed Shortcut Mining achieves 53.3%, 58%, and 43% reduction in off-chip feature map traffic for SqueezeNet, ResNet-34, and ResNet-152, respectively and a 1.93X increase in throughput compared with a state-of-the-art accelerator.
More
Translated text
Key words
System-on-chip,Field programmable gate arrays,Tin,Deep learning,Memory management,Data models
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined