CRAFT: Criticality-Aware Fault-Tolerance Enhancement Techniques for Emerging Memories-Based Deep Neural Networks

arxiv(2023)

引用 1|浏览36
暂无评分
摘要
Deep neural networks (DNNs) have emerged as the most effective programming paradigm for computer vision and natural language processing applications. With the rapid development of DNNs, efficient hardware architectures for deploying DNN-based applications on edge devices have been extensively studied. Emerging nonvolatile memories (NVMs), with their better scalability, nonvolatility, and good read performance, are found to be promising candidates for deploying DNNs. However, despite the promise, emerging NVMs often suffer from reliability issues, such as stuck-at faults, which decrease the chip yield/memory lifetime and severely impact the accuracy of DNNs. A stuck-at cell can be read but not reprogrammed, thus, stuck-at faults in NVMs may or may not result in errors depending on the data to be stored. By reducing the number of errors caused by stuck-at faults, the reliability of a DNN-based system can be enhanced. This article proposes CRAFT, i.e., criticality-aware fault-tolerance enhancement techniques to enhance the reliability of NVM-based DNNs in the presence of stuck-at faults. A data block remapping technique is used to reduce the impact of stuck-at faults on DNNs accuracy. Additionally, by performing bit-level criticality analysis on various DNNs, the critical-bit positions in network parameters that can significantly impact the accuracy are identified. Based on this analysis, we propose an encoding method which effectively swaps the critical bit positions with that of noncritical bits when more errors (due to stuck-at faults) are present in the critical bits. Experiments of CRAFT architecture with various DNN models indicate that the robustness of a DNN against stuck-at faults can be enhanced by up to $10^{5}$ times on the CIFAR-10 dataset and up to 29 times on ImageNet dataset with only a minimal amount of storage overhead, i.e., 1.17%. Being orthogonal, CRAFT can be integrated with existing fault-tolerance schemes to further enhance the robustness of DNNs against stuck-at faults in NVMs.
更多
查看译文
关键词
Deep learning hardware,emerging memories,fault tolerance,neural networks,stuck-at faults
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要