CIM2PQ: An Array-Wise and Hardware-Friendly Mixed Precision Quantization Method for Analog Computing-In-Memory

Sifan Sun,Jinyu Bai, Zhaoyu Shi,Weisheng Zhao,Wang Kang

IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems(2024)

引用 0|浏览0
暂无评分
摘要
Computing-in-memory (CIM) architecture is a promising convolutional neural network (CNN) accelerator known for its highly efficient matrix-vector multiplications (MVMs). However, due to the low-precision computation and limited size of CIM memory arrays, it is necessary to decompose the huge MVMs into smaller subsets. Conventional NN quantization methods overlook the characteristics of CIM hardware, resulting in diminished system performance and efficiency. This paper proposes a mixed precision quantization (MPQ) method based on evolutionary algorithm for CIM-based accelerators, while considering the hardware characteristics of CIM, called CIMPQ, which can automatically generate quantization strategies for NN model to improve the efficiency of CIM systems. Firstly, inspired by the CIM computing paradigm, an array-wise quantization granularity is introduced in the MPQ search space, which can jointly quantize the inputs, weights, and partial sums. Secondly, a production procedure containing fine-grained crossover and progressive adaptive mutation is proposed, which can efficiently explore the search space and speed up the search process. Thirdly, we propose a fast and efficient strategy evaluation method to obtain the performance of quantization strategy on the CIM platform, saving the evaluation time significantly without requiring fine-tuning. Finally, to protect CIM-friendly strategies with lower bit-widths but worse algorithm performance, we propose a strategy selection method based on multi-objective optimization, named qNSGA-III. The effectiveness of the proposed method has been demonstrated through experimental results of various NNs and datasets. For ResNet-18, the hardware efficiency and accuracy can be improved to 117% with 7.05%, 113% with 3.37%, and 119% with 5.78%, on CIFAR-10, CIFAR-100 and ImageNet, respectively, compared to the baseline MPQ method.
更多
查看译文
关键词
Neural network,mixed precision quantization,post-training quantization,computing-in-memory
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要