A 22nm 8Mb STT-MRAM Near-Memory-Computing Macro with 8b-Precision and 46.4-160.1TOPS/W for Edge-AI Devices

ISSCC(2023)

Cited 5|Views78
No score
Abstract
Nonvolatile-memory-based computing in memory (nvCIM) [1–6] is ideal for low-power edge-Al devices requiring neural network (NN) parameter storage in the power-off mode, a rapid response to device wake-up, and high energy efficiency for MAC operations $(\text{EF}_{\text{MAC}})$ . Current analog nvCIMs impose a tradeoff between the signal margin (SM) and the number of accumulations $(\mathrm{N}_{\mathrm{A}\text{CU}})$ per cycle versus $\text{EF}_{\text{MAC}}$ and computing latency $(\mathrm{T}_{\text{CD}-\text{MAC}})$ . Near-memory computing (NMC), with high precision for inputs (IN), weights (W), and outputs (OUT), and a high $\mathrm{N}_{\text{ACU}}$ is a trend to improve $\text{EF}_{\text{MAC}}, \mathrm{T}_{\text{CD}-\text{MAC}}$ , and accuracy. A prior STT-MRAM NMC [1] uses vertical-weight mapping (VWM) to improve the $\text{EF}_{\text{MAC}}$ ; however, further improvement is challenging: due to (1) the large energy consumption in reading repetitious weight data across multiple inputs for a single NN-layer; (2) a high bitstream toggling-rate (BTR) for digital MAC circuits $(\text{DC}_{\text{MAC}})$ reduces $\text{EF}_{\text{MAC}}$ , and; (3) a limited SM and memory readout latency $(\mathrm{T}_{\text{CD}-\mathrm{M}})$ for memories with a small R-ratio (e.g. STT-MRAM, see Fig. 33.2.1). In developing an STT-MRAM nvCIM macro, this work moves beyond circuit-level novelty by using system-software-circuit co-design. This work achieves a high $\text{EF}_{\text{MAC}}$ , a short $\mathrm{T}_{\text{CD-M}}$ , a high read bandwidth (R-BW), a high IN-W-OUT precision, and a high $\mathrm{N}_{\text{ACU}}$ by using the novel schemes: (1) a hardware based weight-feature aware read (WFAR) to reduce weight accesses and improve $\text{EF}_{\text{MAC}}$ with a minimal area overhead; (2) toggling-aware weight-tuning (TAWT) to obtain fine-tuned weights $(\mathrm{W}_{\text{FT}})$ with a low BTR, which is based on VWM to enhance the $\text{EF}_{\text{MAC}}$ of the $\text{DC}_{\text{MAC}}$ ; (3) a differential charge-accumulating margin-enhanced voltage-sensing amplifier (DCME-VSA) to enhance the SM, while reducing the T CD - M . The proposed 22-nm S-Mb STT-MRAM NMC nvCIM macro achieves the highest R-BW $(436\text{GB}/\mathrm{s})$ and $\text{EF}_{\text{MAC}}(46.4-160.1\text{TO}\text{PS}/\mathrm{W})$ for $\mathrm{N}_{\mathrm{A}\text{CU}}=576$ for SblN - SbW - 26bOUT.
More
Translated text
Key words
analog nvCIM,computing latency,differential charge-accumulating margin,digital MAC circuits,enhanced voltage-sensing amplifier,fine-tuned weights,large energy consumption,low-power edge-Al devices,MAC operations,minimal area overhead,near-memory-computing macro,neural network parameter storage,nonvolatile-memory-based computing in memory,repetitious weight data,signal margin,size 22.0 nm,STT-MRAM near-memory-computing macro,STT-MRAM NMC,STT-MRAM nvCIM macro,system-software-circuit co-design,toggling-aware weight-tuning,vertical-weight mapping,weight-feature aware read,word length 8 bit
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined