Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory

Chao Jiang,David Ojika,Bhavesh Patel,Herman Lam

2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)（2021）

引用 9|浏览1

暂无评分

摘要

Large Convolutional Neural Networks (CNNs) are often pruned and compressed to reduce the amount of parameters and memory requirement. However, the resulting irregularity in the sparse data makes it difficult for FPGA accelerators that contains systolic arrays of Multiply-and-Accumulate (MAC) units, such as Intel’s FPGA-based Deep Learning Accelerator (DLA), to achieve their maximum potential. More...

查看译文

关键词

Deep learning,Tensors,Computational modeling,Memory management,Bandwidth,Tools,Space exploration

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要