Unlocking Wordline-level Parallelism for Fast Inference on RRAM-based DNN Accelerator.

ICCAD(2020)

引用 5|浏览12
暂无评分
摘要
In-memory computing is rapidly rising as a viable solution that can effectively accelerate neural networks by overcoming the memory wall. Resistive RAM (RRAM) crossbar array is in the spotlight as a building block for DNN inference accelerators since it can perform a massive amount of dot products in memory in an area- and power-efficient manner. However, its in-memory computation is vulnerable to errors due to the non-ideality of RRAM cells. This error-prone nature of RRAM crossbar limits its wordline-level parallelism as activating a large number of wordlines accumulates non-zero current contributions from RRAM cells in the high-resistance state as well as current deviations from individual cells, leading to a significant accuracy drop. To improve performance by increasing the maximum number of concurrently activated wordlines, we propose two techniques. First, we introduce a lightweight scheme that effectively eliminates the current contributions from high-resistance state cells. Second, based on the observation that not all layers in a neural network model have the same error rates and impact on the inference accuracy, we propose to allow different layers to activate non-uniform numbers of wordlines concurrently. We also introduce a systematic methodology to determine the number of concurrently activated wordlines for each layer with a goal of optimizing performance, while minimizing the accuracy degradation. Our proposed techniques increase the inference throughput by 3-10× with a less than 1% accuracy drop over three datasets. Our evaluation also demonstrates that this benefit comes with a small cost of only 8.2% and 5.3% increase in area and power consumption, respectively.
更多
查看译文
关键词
RRAM cells,concurrently activated wordlines,high-resistance state cells,neural network model,error rates,inference accuracy,performance optimization,inference throughput,wordline-level parallelism,fast inference,RRAM-based DNN accelerator,in-memory computing,memory wall,resistive RAM crossbar array,DNN inference accelerators,dot products,in-memory computation,nonideality,RRAM crossbar,nonzero current contributions,power consumption
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要