Hybrid Stochastic-Binary Computing for Low-Latency and High-Precision Inference of CNNs

IEEE Transactions on Circuits and Systems I: Regular Papers(2022)

引用 5|浏览7
暂无评分
摘要
The appealing property of low area, low power, and high bit error tolerance has made Stochastic Computing (SC) a promising alternative to conventional binary arithmetic for many computation intensive tasks, e.g., convolutional neural networks (CNNs). However, current SC-based CNN accelerators suffer from the intrinsic computation error and exponentially growing latency. In this work, we optimize both the architecture of SC multiply-and-accumulate (MAC) unit and the overall acceleration strategy of CNN accelerator to favor SC. A low-complexity bit-stream-extending method is proposed to suppress the computation error of SC and ensure the trained fix-point model can be deployed into SC-based hardware without fine-tuning. Besides, distribution-determined partition scheme is developed to design hybrid stochastic-binary computing (SBC) MAC unit which boosts the processing of bit streams at a minimum overhead. For the overall accelerator, the SBC-based MAC array is extended to reuse hardware resources and improve throughput, since the judiciously chosen loop unrolling strategy can better benefit SC operations. The proposed CNN accelerator with extended SBC-MAC array is synthesized and validated using TSMC 28nm CMOS on several representative CNNs, targeted at ImageNet dataset. Compared with precise binary implementation, our proposed design gains 44% area reduction and 50% power saving but induces only 4% additional computation latency and 0.5% accuracy degradation.
更多
查看译文
关键词
Convolutional neural network,hardware accelerator,stochastic computing,energy-efficient,area-efficient
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要