FPGA-based CNN inference accelerator synthesized from multi-threaded C software

2017 30th IEEE International System-on-Chip Conference (SOCC)(2018)

引用 50|浏览17
暂无评分
摘要
A deep-learning inference accelerator is synthesized from a C-language software program parallelized with Pthreads. The software implementation uses the well-known producer/consumer model with parallel threads interconnected by FIFO queues. The LegUp high-level synthesis (HLS) [1] tool synthesizes threads into parallel FPGA hardware, translating software parallelism into spatial parallelism. A complete system is generated where convolution, pooling and padding are realized in the synthesized accelerator, with remaining tasks executing on an embedded ARM processor. The accelerator incorporates reduced precision, and a novel approach for zero-weight-skipping in convolution. On a mid-sized Intel Arria 10 SoC FPGA, peak performance on VGG-16 is 138 effective GOPS.
更多
查看译文
关键词
CNN inference accelerator,deep-learning inference accelerator,C-language software program,Pthreads,parallel threads,FIFO queues,LegUp high-level synthesis,parallel FPGA hardware,embedded ARM processor,Intel Arria 10 SoC FPGA,multi-threaded C software
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要