Bitwise Neural Network Acceleration: Opportunities and Challenges

2019 8th Mediterranean Conference on Embedded Computing (MECO)(2019)

引用 1|浏览6
暂无评分
摘要
Real-time inference of deep convolutional neural networks (CNNs) on embedded systems and SoCs would enable many interesting applications. However these CNNs are computation and data expensive, making it difficult to execute them in real-time on energy constrained embedded platforms. Resent research has shown that light-weight CNNs with quantized model weights and activations constrained to one bit only {-1,+ 1} can still achieve reasonable accuracy, in comparison to the non quantized 32-bit model. These binary neural networks (BNNs) theoretically allow to drastically reduce the required energy and run-time by reduction of memory size, number of memory accesses, and finally computation power by replacing expensive two's complement arithmetic operations with more efficient bitwise versions. To make use of these advantages, we propose a bitwise CNN accelerator (BNNA) mapped on an FPGA. We implement the Hubara'16 network [1] on the Xilinx Zynq-7020 SoC. Massive parallelism is achieved performing 4608 parallel binary MACs in total, which enables us to archive real-time speed up to 110 fps, while using only 22% of the FPGA LUTs. In comparison to a 32-bit network, a speed up of 32 times is achieved, and a resource reduction of 40 times is achieved, where the memory bandwidth is the main bottleneck. The provided detailed analysis of the carefully crafted accelerator design exposes the challenges and opportunities in bitwise neural network accelerator design.
更多
查看译文
关键词
embedded systems,SoCs,light-weight CNNs,quantized model weights,binary neural networks,memory size,bitwise CNN accelerator,Hubara16 network,Xilinx Zynq-7020 SoC,4608 parallel binary MACs,carefully crafted accelerator design,bitwise neural network acceleration,deep convolutional neural networks,arithmetic operations,inference,BNNA,FPGA LUTs
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要