Layer-Sensitive Neural Processing Architecture for Error-Tolerant Applications
IEEE TRANSACTIONS ON VERY LARGE SCALE INTEGRATION (VLSI) SYSTEMS(2024)
Abstract
Neural network (NN) operation has high requirements for storage resources and parallel computing, which bring huge challenges to the deployment of NNs in Internet-of-Things (IoT) devices. Consequently, this work proposed a low-power NN architecture, comprising an energy-efficient NN processor and a Cortex-M3 host processor to achieve state-of-the-art (SOTA) end-to-end inference at the edge. The innovations of this article are as follows: 1) to minimize the bit width of the weight while keeping the loss of accuracy within a small range, cross-layer error tolerance has been analyzed, and mixed precision quantization has been adopted for cross-layer mapping; 2) dynamic reconfigurable tensor processing unit (DR-TPU) with approximate computing has been proposed, which brings 1.45 $\times$ computing energy reduction within 0.46% accurate loss in ResNet-50; and 3) a customized input feature map (IFM) reuse and over-writeback strategy has been adopted, eliminating the recurrent fetching from the on-chip and off-chip memories. The times of on-chip storage access can be reduced by 25%-60%, and the capacity of on-chip memory can be reduced to half of the original. The processor has been implemented at 28- $\text{nm}$ CMOS technology. Combining the above work, the proposed architecture can achieve a 53.1% reduction of power and 17.2-TOPS/W energy efficiency.
MoreTranslated text
Key words
Approximate computing,energy-efficient design,Internet of Things (IoT),layerwise quantization,neural network (NN) processor
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined