Learning-Oriented Reliability Improvement of Computing Systems From Transistor to Application Level

2023 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION, DATE(2023)

引用 0|浏览26
暂无评分
摘要
Due to technology scaling in modern computing platforms, the safety and reliability issues have increased tremendously, which often accelerate aging, lead to permanent faults, and cause unreliable execution of applications. Failure in some computing systems like avionics may cause catastrophic consequences. Therefore, managing reliability under all circumstances of stress and environmental changes is crucial in all abstraction layers, from application to transistor levels. Machine learning techniques are recently being employed for dynamic reliability estimation and optimization. They can adapt to varying workloads and system conditions. This paper presents reliability improvement approaches from multiple perspectives-from transistor-level to application-level-and discusses their effectiveness and limitations as well as open challenges.
更多
查看译文
关键词
Aging,Cross-layer reliability,Device and circuit reliability,Dynamic reliability estimation,Error mitigation,Machine learning for systems,Task scheduling,Timing reliability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要