BFP-CIM: Runtime Energy-Accuracy Scalable Computing-in-Memory-Based DNN Accelerator Using Dynamic Block-Floating-Point Arithmetic

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS(2023)

引用 0|浏览0
暂无评分
摘要
Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. In analog CIM-based accelerators, ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system's energy consumption. To prevent ADCs from damaging the energy efficiency benefits of CIM, researchers have explored quantizing CNNs to use low-precision ADCs, trading off accuracy for energy efficiency. However, these approaches often necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) method for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. We also seamlessly incorporate a bit-level zero-skipping mechanism by dynamically forming input blocks. Experimental results demonstrate that our methods maintain accuracy while achieving up to 1.81 x and 2.08 x energy efficiency improvements on the CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.
更多
查看译文
关键词
Computing-in-memory,deep learning,data-free quantization,block-floating-point arithmetic
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要