Race: Reverse-Order Processor Reliability Analysis

PROCEEDINGS OF THE 2020 DESIGN, AUTOMATION & TEST IN EUROPE CONFERENCE & EXHIBITION (DATE 2020)(2020)

引用 1|浏览8
暂无评分
摘要
Modern microprocessors suffer from increased error rates that come along with fabrication technology scaling. Processor designs continuously become more prone to hardware faults that lead to execution errors and system failures, which raise the requirement of protection mechanisms. However, error mitigation strategies have to be applied diligently, as they impose significant power, area, and performance overheads. Early and accurate reliability estimation of a microprocessor design is essential in order to determine the most vulnerable hardware structures and the most efficient protection schemes. One of the most commonly used techniques for reliability estimation is Architecturally Correct Execution (ACE) analysis.ACE analysis can be applied at different abstraction models, including microarchitecture and RTL and often requires a single or few simulations to report the Architectural Vulnerability Factor (AVF) of the processor structures. However, ACE analysis overestimates the vulnerability of structures because of its pessimistic, worst-case nature. Moreover, it only delivers coarse-grain vulnerability reports and no details about the expected result of hardware faults (silent data corruptions, crashes). In this paper, we present reverse ACE (rACE), a methodology that (a) improves the accuracy of ACE analysis and (b) delivers fine-grain error outcome reports. Using a reverse-order tracing flow, rACE analysis associates portions of the simulated execution of a program with the actual output and the control flow, delivering finer accuracy and results classification. Our findings show that rACE reports an average 1.45X overestimation, compared to Statistical Fault Injection, for different sizes of the register file of an out-of-order CPU core (executing both ARM and x86 binaries), when a baseline ACE analysis reports 2.3X overestimation and even refined versions of ACE analysis report an average of 1.8X overestimation.
更多
查看译文
关键词
reliability, dependability, microarchitecture, simulation, ACE analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要