HW-SW Interface Design and Implementation for Error Logging and Reporting for RAS Improvement

IEEE Access(2024)

引用 0|浏览0
暂无评分
摘要
When designing a resilient computing system, the desired degree of Reliability, Availability, and Serviceability (RAS) must be assessed and guaranteed. This article presents a Hardware-Software (HW-SW) Interface for Error Logging and Reporting independent of specific Instruction Set Architectures (ISA), aiming to improve RAS in computing systems. A HW-SW Interface defines the facilities by which detected hardware errors are logged into an ad hoc set of registers (i.e., Error Record) and then reported to system software. System software will promptly address and recover from those errors, preventing system failures. Our architecture offers flexible and configurable Error Logging and Reporting features, satisfying the requirements of different application scenarios by selectively incorporating or removing specific features. After reporting the most relevant results from synthesis on FPGA (Xilinx UltraScale+ MPSoC) and Standard-Cell technologies (45nm and 7nm libraries), we discuss them to provide valuable insights on the dependency of resource utilization on error logging capability. Then, we validate the Error Logging and Reporting features of our architecture by developing a test SoC on FPGA that emulates a computing system, including a 32-bit RISC-V core and two ECC-protected (Error Correcting Code) memories. The proposed HW-SW Interface extends beyond monitoring only ECC-protected memories, yet it can monitor any system module incorporating error control logic.
更多
查看译文
关键词
Error Logging,Error Reporting,FPGA,HW-SW Interface,Reliability-Availability-Serviceability
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要