OR-ML - Enhancing Reliability for Machine Learning Accelerator with Opportunistic Redundancy.

DATE(2021)

引用 1|浏览12
暂无评分
摘要
Reliability plays a central role in deep sub-micron and nanometre IC fabrication technology and has recently been reported to be one of the key issues affecting the inference phase of neural networks. State-of-the-art machine learning (ML) accelerators exploit massively computing parallelism observed in neural networks to achieve high energy efficiency. The topology of ML engines' computing fabric, which constitutes large arrays of processing elements (PEs), has been increasing dramatically to incorporate the huge size and heterogeneity of the rapid evolving ML algorithm. However, it is commonly observed that activations of zero value lead to reduced PE utilization. In this work, we present a novel and low-cost approach to enhance the reliability of generic ML accelerators by Opportunistically exploring the chances of runtime Redundancy provided by neighbouring PEs, named as OR-ML. In contrast to conventional redundancy techniques, the proposed technique introduces no additional computing resources, therefore significantly reduces the implementation overhead and achieves obvious level of protection. The design prototype is evaluated using emulated fault injection on FPGA, executing mainstream neural networks for objection classification and detection.
更多
查看译文
关键词
Fault tolerance,machine learning accelerator,opportunistic redundancy
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要