BFP-CIM: Runtime Energy-Accuracy Scalable Computing-in-Memory-Based DNN Accelerator Using Dynamic Block-Floating-Point Arithmetic

Cheng-Yang Chang,Chi-Tse Huang,Yu-Chuan Chuang,Kuang-Chao Chou,An-Yeu Wu

IEEE TRANSACTIONS ON CIRCUITS AND SYSTEMS I-REGULAR PAPERS（2023）

引用 0|浏览0

暂无评分

摘要

Convolutional neural networks (CNNs) are known for their exceptional performance in various applications; however, their energy consumption during inference can be substantial. Analog Computing-In-Memory (CIM) has shown promise in enhancing the energy efficiency of CNNs, but the use of analog-to-digital converters (ADCs) remains a challenge. In analog CIM-based accelerators, ADCs convert analog partial sums from CIM crossbar arrays to digital values, with high-precision ADCs accounting for over 60% of the system's energy consumption. To prevent ADCs from damaging the energy efficiency benefits of CIM, researchers have explored quantizing CNNs to use low-precision ADCs, trading off accuracy for energy efficiency. However, these approaches often necessitate data-dependent adjustments to minimize accuracy loss. Instead, we observe that the first most significant toggled bit indicates the optimal quantization range for each input value. Accordingly, we propose a range-aware rounding (RAR) method for runtime bit-width adjustment, eliminating the need for pre-deployment efforts. RAR can be easily integrated into a CIM accelerator using dynamic block-floating-point arithmetic. We also seamlessly incorporate a bit-level zero-skipping mechanism by dynamically forming input blocks. Experimental results demonstrate that our methods maintain accuracy while achieving up to 1.81 x and 2.08 x energy efficiency improvements on the CIFAR-10 and ImageNet datasets, respectively, compared with state-of-the-art techniques.

查看译文

关键词

Computing-in-memory,deep learning,data-free quantization,block-floating-point arithmetic

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要