Resource and data optimization for hardware implementation of deep neural networks targeting FPGA-based edge devices

SLIP@DAC(2018)

引用 13|浏览23
暂无评分
摘要
Recently, as machine learning algorithms have become more practical, there has been much effort to implement them on edge devices that can be used in our daily lives. However, unlike server-scale devices, edge devices are relatively small and thus have much more limited resources. Therefore, control of resource usage and hardware optimization play an important role when we implement machine learning algorithms on an edge device. In this paper, we target convolutional neural networks (CNN) and explore various optimization and design techniques to realize them on FPGA devices. The key idea explored in this paper is Backward Pipeline Scheduling together with Latency Balancing which optimize the pipeline between CNN layers in order to significantly reduce the overall latency for processing a single image. We also develop a batch processing design to improve the throughput of the FPGA solution. We have achieved latency of 175.7μs for classifying one image in the MNIST data set using LeNet and 653.4μs for classifying one image in Cifar-10 data set using CifarNet. Without retraining, we are still able to maintain high accuracy of 97.6% for MNIST data set and 83.6% for the Cifar-10 data set. Our achieved single-image latency is 5.2x faster for LeNet and 1.95x faster for CifarNet compared to the NVIDIA Jetson TX1 solution.
更多
查看译文
关键词
FPGA, Convolutional Neural Network, Optimization, Acceleration, High-Level Synthesis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要