Ada-FA: A Comprehensive Framework for Adaptive Fault Tolerance and Ageing Mitigation in FPGAs

IEEE Internet of Things Journal(2024)

引用 0|浏览1
暂无评分
摘要
Commercial SRAM-based field-programmable gate arrays (FPGAs) are extremely susceptible to failures caused by external ionizing radiation or prolonged internal overloading in harsh applications, such as single event effects (SEEs) and ageing failure. Existing methods utilize the triple modular redundancy (TMR) architecture to shield against the effects of radiation on FPGA systems. However, these solutions are resource-costly and practically unnecessary. Additionally, hard faults caused by the ageing effects of long-term usage of FPGA systems is not effectively alleviated. To address these issues, we present a comprehensive framework for ensuring adaptive fault tolerance and aging mitigation in FPGAs, i.e., Ada-FA. Ada-FA is a cross-layer-aware reliability framework that includes two phases: offline and online. (1) In the offline phase, a task criticality evaluation strategy supporting fine-grained fault tolerance is proposed to reduce the hardware resource overhead. Specifically, we improve the integer linear programming (ILP) formula, which considers both fault tolerance and ageing mitigation, to obtain the optimal reliability-aware layout, thus maximizing the mean time to failure (MTTF). (2) In the online phase, we propose a runtime management architecture to further ensure the reliable operation of FPGA systems. The experimental results show that the resource usage (RU) of the proposed Ada-FA framework is reduced by 15.8% on average compared to that of existing fault-tolerant layout/scheduling methods. Moreover, our method provides a higher reliability and task accomplishment rate (TAR) than the state-of-the-art offline ageing mitigation methods.
更多
查看译文
关键词
FPGA,Fault Tolerance,Ageing Mitigation,Task Criticality,Reliability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要