CNN-DMA: A Predictable and Scalable Direct Memory Access Engine for Convolutional Neural Network with Sliding-window Filtering.

Zheng Wang,Zhuo Wang,Jian Liao,Chao Chen,Yongkui Yang,Bo Dong,Weiguang Chen,Wenxuan Chen,Ming Lei,Weiyu Guo,Rui Chen,Yi Peng,Zhibin Yu

ACM Great Lakes Symposium on VLSI（2021）

引用 2|浏览20

暂无评分

摘要

Memory bandwidth utilization has become the key performance bottleneck for state-of-the-art variants of neural network kernels. Current structures such as depth-wise, point-wise and atrous convolutions have already introduced diverse and discontinuous memory access patterns, which impact efficient activation supply due to more frequent cache misses and consequently high-penalty DRAM pre-charging. To handle this, GPU achieves efficient parallelization with sophisticated optimization of CUDA program to reduce memory footprints, which demands high engineering efforts. In this work, we in contrast propose a programmable direct memory access engine for convolutional neural networks (CNN-DMA) supporting a fast supply of activation for independent and scalable computing units. The CNN-DMA favours a predictable activation streaming approach which completely avoids penalties by bus contention, cache misses and less carefully designed low-level programs. Furthermore, we enhance the baseline DMA with the capability of out-of-order data supply to filter out unique sliding-windows to boost the performance of the computing infrastructure. Experiments on state-of-the-art neural networks show that CNN-DMA achieves optimal DRAM access efficiency for point-wise convolution layers, while reduces 30% to 70% rounds of computation with sliding-window filtering.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要