Optimized FPGA-based Deep Learning Accelerator for Sparse CNN using High Bandwidth Memory

2021 IEEE 29th Annual International Symposium on Field-Programmable Custom Computing Machines (FCCM)(2021)

引用 9|浏览1
暂无评分
摘要
Large Convolutional Neural Networks (CNNs) are often pruned and compressed to reduce the amount of parameters and memory requirement. However, the resulting irregularity in the sparse data makes it difficult for FPGA accelerators that contains systolic arrays of Multiply-and-Accumulate (MAC) units, such as Intel’s FPGA-based Deep Learning Accelerator (DLA), to achieve their maximum potential. More...
更多
查看译文
关键词
Deep learning,Tensors,Computational modeling,Memory management,Bandwidth,Tools,Space exploration
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要