A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices
ISSCC(2023)
Abstract
Nonvolatile-memory-based computing in memory (nvCIM) [1–6] is ideal for low-power edge-Al devices requiring neural network (NN) parameter storage in the power-off mode, a rapid response to device wake-up, and high energy efficiency for MAC operations
$(\text{EF}_{\text{MAC}})$
. Current analog nvCIMs impose a tradeoff between the signal margin (SM) and the number of accumulations
$(\mathrm{N}_{\mathrm{A}\text{CU}})$
per cycle versus
$\text{EF}_{\text{MAC}}$
and computing latency
$(\mathrm{T}_{\text{CD}-\text{MAC}})$
. Near-memory computing (NMC), with high precision for inputs (IN), weights (W), and outputs (OUT), and a high
$\mathrm{N}_{\text{ACU}}$
is a trend to improve
$\text{EF}_{\text{MAC}}, \mathrm{T}_{\text{CD}-\text{MAC}}$
, and accuracy. A prior STT-MRAM NMC [1] uses vertical-weight mapping (VWM) to improve the
$\text{EF}_{\text{MAC}}$
; however, further improvement is challenging: due to (1) the large energy consumption in reading repetitious weight data across multiple inputs for a single NN-layer; (2) a high bitstream toggling-rate (BTR) for digital MAC circuits
$(\text{DC}_{\text{MAC}})$
reduces
$\text{EF}_{\text{MAC}}$
, and; (3) a limited SM and memory readout latency
$(\mathrm{T}_{\text{CD}-\mathrm{M}})$
for memories with a small R-ratio (e.g. STT-MRAM, see Fig. 33.2.1). In developing an STT-MRAM nvCIM macro, this work moves beyond circuit-level novelty by using system-software-circuit co-design. This work achieves a high
$\text{EF}_{\text{MAC}}$
, a short
$\mathrm{T}_{\text{CD-M}}$
, a high read bandwidth (R-BW), a high IN-W-OUT precision, and a high
$\mathrm{N}_{\text{ACU}}$
by using the novel schemes: (1) a hardware based weight-feature aware read (WFAR) to reduce weight accesses and improve
$\text{EF}_{\text{MAC}}$
with a minimal area overhead; (2) toggling-aware weight-tuning (TAWT) to obtain fine-tuned weights
$(\mathrm{W}_{\text{FT}})$
with a low BTR, which is based on VWM to enhance the
$\text{EF}_{\text{MAC}}$
of the
$\text{DC}_{\text{MAC}}$
; (3) a differential charge-accumulating margin-enhanced voltage-sensing amplifier (DCME-VSA) to enhance the SM, while reducing the T
CD
-
M
. The proposed 22-nm S-Mb STT-MRAM NMC nvCIM macro achieves the highest R-BW
$(436\text{GB}/\mathrm{s})$
and
$\text{EF}_{\text{MAC}}(46.4-160.1\text{TO}\text{PS}/\mathrm{W})$
for
$\mathrm{N}_{\mathrm{A}\text{CU}}=576$
for SblN - SbW - 26bOUT.
MoreTranslated text
Key words
analog nvCIM,computing latency,differential charge-accumulating margin,digital MAC circuits,enhanced voltage-sensing amplifier,fine-tuned weights,large energy consumption,low-power edge-Al devices,MAC operations,minimal area overhead,near-memory-computing macro,neural network parameter storage,nonvolatile-memory-based computing in memory,repetitious weight data,signal margin,size 22.0 nm,STT-MRAM near-memory-computing macro,STT-MRAM NMC,STT-MRAM nvCIM macro,system-software-circuit co-design,toggling-aware weight-tuning,vertical-weight mapping,weight-feature aware read,word length 8 bit
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined