15.4 A 5.99-to-691.1TOPS/W Tensor-Train In-Memory-Computing Processor Using Bit-Level-Sparsity-Based Optimization and Variable-Precision Quantization

Ruiqi Guo,Zhiheng Yue,Xin Si,Te Hu,Hao Li,Limei Tang,Yabing Wang,Leibo Liu,Meng-Fan Chang,Qiang Li,Shaojun Wei,Shouyi Yin

2021 IEEE International Solid- State Circuits Conference (ISSCC)（2021）

引用 12|浏览2

暂无评分

摘要

Computing-in-memory (CIM) improves energy efficiency by enabling parallel multiply-and-accumulate (MAC) operations and reducing memory accesses [1 –4]. However, today’s typical neural networks (NNs) usually exceed on-chip memory capacity. Thus, a CIM-based processor may encounter a memory bottleneck [5]. Tensor-train (TT) is a tensor decomposition method, which decomposes a d-dimensional tensor to d 4D tensor-cores $\left(\operatorname{TCs}: G_{k}\left[r_{k-1}, n_{k}, m_{k}, r_{k}\right], k=1, \ldots, d\right)$ [6]. $G_{k}$ can be viewed as a $2D n_{k} \times m_{k}$ array, where each element is an $r_{k-1} \times r_{k}$ matrix. The TCs require $\Sigma_{k \in[1, d]} r_{k-1} n_{k} m_{k} r_{k}$ parameters to represent the original tensor, which has $\Pi_{\mathrm{k} \in[1, \mathrm{d}]} \mathrm{n}_{\mathrm{k}} \mathrm{m}_{\mathrm{k}}$ parameters. Since r k is typically small, kernels and weight matrices of convolutional, fully-connected and recurrent layers can be compressed significantly by using TT decomposition, thereby enabling storage of an entire NN in a CIM-based processor.

查看译文

关键词

tensor-train,in-memory-computing,bit-level-sparsity-based,variable-precision

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要