APQ: Automated DNN Pruning and Quantization for ReRAM-Based Accelerators

Siling Yang,Shuibing He, Hexiao Duan,Weijian Chen,Xuechen Zhang, Tong Wu,Yanlong Yin

IEEE Transactions on Parallel and Distributed Systems（2023）

引用 0|浏览12

暂无评分

摘要

Emerging ReRAM-based accelerators support in-memory computation to accelerate deep neural network (DNN) inference. Weight matrix pruning is a widely used technique to reduce the size of DNN models, thereby reducing the resource and energy consumption of ReRAM-based accelerators. However, existing pruning works for ReRAM-based accelerators have three major issues. First, they use heuristics or rules from domain experts to prune the weights, leading to sub-optimal pruning policies. Second, they use row or column-level coarse-granularity methods to prune weights, resulting in poor compression rates with model accuracy constraints. Third, they only apply the weight pruning technique individually, losing the compression opportunity of both pruning and quantization. In this article, we propose an Automated DNN Pruning and Quantization framework, named APQ , for ReRAM-based accelerators. First, APQ adopts reinforcement learning (RL) to automatically determine the pruning policy for DNN layers for a global optimum. Second, it prunes and maps weight matrices to a ReRAM-based accelerator in a finer granularity of column-vector, which improves the compression rates with the accuracy constraints. To address the dislocation problem, it uses a new data path in ReRAM-based accelerators to correctly index and feed input to matrix-vector computation. Third, to further reduce resource consumption, APQ also leverages reinforcement learning to automatically determine the quantization bitwidth of each layer of the pruned DNN model. Experimental results show that, APQ achieves up to 4.52X compression rate, 4.11X area efficiency, and 4.51X energy efficiency with similar or even higher model accuracy, compared to the state-of-the-art work.

查看译文

关键词

automated dnn pruning,quantization,reram-based

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要