ReIPE: Recycling Idle PEs in CNN Accelerator for Vulnerable Filters Soft-Error Detection

Xiaohui Wei, Chenyang Wang,Hengshan Yue,Jingweijia Tan, Zeyu Guan, Nan Jiang, Xinyang Zheng,Jianpeng Zhao,Meikang Qiu

ACM Transactions on Architecture and Code Optimization(2024)

Cited 0|Views5
No score
Abstract
To satisfy prohibitively massive computational requirements of current deep Convolutional Neural Networks (CNNs), CNN-specific accelerators are widely deployed in large-scale systems. Caused by high energy neutrons and α-particle strikes, soft error may lead to catastrophic failures when CNN is deployed on high integration density accelerators. As CNNs become ubiquitous in mission-critical domains, ensuring the reliable execution of CNN accelerators in the presence of soft errors is increasingly essential. In this paper, we propose to Recycle Idle Processing Elements (PEs) in CNN Accelerator for vulnerable filters soft error detection (ReIPE). Considering the error-sensitivity of filters, ReIPE first carries out a filter-level gradient analysis process to replace fault injection for fast filter-wise error resilience estimation. Then, to achieve maximal reliability benefits, combining the hardware-level systolic array idleness and software-level CNN filter-wise error resilience profile, ReIPE preferentially duplicated loads the most vulnerable filters onto systolic array to recycle idle-column PEs for opportunistically redundant execution (error detection). Exploiting the data reuse properties of accelerators, ReIPE incorporates the error detection process into the original computation flow of accelerators to perform real-time error detection. Once the error is detected, ReIPE will trigger a correction round to rectify the erroneous output. Experimental results performed on LeNet-5, Cifar-10-CNN, AlexNet, ResNet-20, VGG-16 and ResNet-50 exhibit that ReIPE can cover 96.40% of errors while reducing 75.06% performance degradation and 67.79% energy consumption of baseline dual modular redundancy DMR on average. Moreover, to satisfy the reliability requirements of various application scenarios, ReIPE is also applicable for pruned, quantized and Transformer-based models, as well as portable to other accelerator architectures.
More
Translated text
Key words
Soft-error,Systolic array,CNN,Error resilience analysis.
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined