Partial Sum Quantization for Computing-In-Memory-Based Neural Network Accelerator

IEEE Transactions on Circuits and Systems II: Express Briefs(2023)

引用 2|浏览12
暂无评分
摘要
Computing-in-memory (CIM) has been successful as an ideal hardware platform to improve the performance and efficiency of convolutional neural networks (CNNs). However, owing to the limited size of a memory array, the input and weight matrices of a convolution operation have to be split into sub-matrices, involving partial sums. Generally, high-resolution analog-to-digital converters (ADCs) are used to obtain partial sums for maintaining the computing precision, but at the cost of high area and energy. Partial sum quantization (PSQ), which can be exploited to significantly reduce the ADC’s resolution, is still an open question in this field. This brief proposes a novel PSQ approach for CIM using post-training quantization based on a newly defined array-wise granularity. Meanwhile, as the non-linearity of ADCs’ transfer function has a severe impact on the accuracy, a gradient estimation method based on smooth approximation is proposed to solve such a problem. Experiments on various CNNs show that the required ADCs’ resolution can be reduced from 11-bit to even 3-bit with slight accuracy loss (~1.63%), and the energy-efficiency is increased by up to 224%.
更多
查看译文
关键词
neural network accelerator,partial sum quantization,neural network,computing-in-memory-based
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要