Chrome Extension
WeChat Mini Program
Use on ChatGLM

Optimizing Stochastic Computing for Low Latency Inference of Convolutional Neural Networks

Zhiyuan Chen, Yufei Ma, Zhongfeng Wang

2020 IEEE/ACM International Conference On Computer Aided Design (ICCAD)(2020)

Cited 4|Views10
No score
Abstract
The appealing property of low area, low power, flexible precision, and high bit error tolerance has made Stochastic Computing (SC) a promising alternative to conventional binary arithmetic for many computation intensive tasks, e.g., convolutional neural networks (CNNs). However, to relieve the intrinsic fluctuation noise in SC, long bit stream is normally required in SC-based CNN accelerators to achieve satisfactory accuracy, which leads to extortionate latency. Although the bit parallel structure of a SC multiplier has been proposed to reduce latency, the resulting extra overhead still considerably degrade the overall efficiency of SC. In this paper, we optimize both the micro-architecture of SC multiply-and-accumulate (MAC) unit and the overall acceleration scheme of CNN accelerator to favor SC. An optimized and scalable SC-MAC unit, which fully utilizes the property of low-discrepancy bit stream, is proposed with adjustable parameters to reduce the latency with minor area increase. For the overall accelerator, the parallel dimensions of SC-based MAC array are extended to reuse hardware resources and improve throughput, since the judiciously chosen loop unrolling strategy can better benefit SC operations. The proposed CNN accelerator with extended SC-MAC array is synthesized and demonstrated using TSMC 28nm CMOS on several representative CNNs, which gains 2x performance speedup, 2.8x energy savings and 15% area reduction compared to state-of-the-art SC based CNN accelerator.
More
Translated text
Key words
optimized SC-MAC unit,scalable SC-MAC unit,low-discrepancy bit stream,parallel dimensions,SC-based MAC array,extended SC-MAC array,stochastic computing,low latency inference,convolutional neural networks,bit error tolerance,binary arithmetic,intrinsic fluctuation noise,SC-based CNN accelerators,bit parallel structure,SC multiplier,acceleration scheme,multiply-and-accumulate unit microarchitecture,TSMC CMOS,representative CNN,energy savings,size 28.0 nm
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined