Hybrid Modular Redundancy: Exploring Modular Redundancy Approaches in RISC-V Multi-Core Computing Clusters for Reliable Processing in Space
ACM Transactions on Cyber-Physical Systems(2023)
摘要
Space Cyber-Physical Systems (S-CPS) such as spacecraft and satellites
strongly rely on the reliability of onboard computers to guarantee the success
of their missions. Relying solely on radiation-hardened technologies is
extremely expensive, and developing inflexible architectural and
microarchitectural modifications to introduce modular redundancy within a
system leads to significant area increase and performance degradation. To
mitigate the overheads of traditional radiation hardening and modular
redundancy approaches, we present a novel Hybrid Modular Redundancy (HMR)
approach, a redundancy scheme that features a cluster of RISC-V processors with
a flexible on-demand dual-core and triple-core lockstep grouping of computing
cores with runtime split-lock capabilities. Further, we propose two recovery
approaches, software-based and hardware-based, trading off performance and area
overhead. Running at 430 MHz, our fault-tolerant cluster achieves up to 1160
MOPS on a matrix multiplication benchmark when configured in non-redundant mode
and 617 and 414 MOPS in dual and triple mode, respectively. A software-based
recovery in triple mode requires 363 clock cycles and occupies 0.612 mm2,
representing a 1.3
As a high-performance alternative, a new hardware-based method provides rapid
fault recovery in just 24 clock cycles and occupies 0.660 mm2, namely 9.4
area overhead over the baseline non-redundant RISC-V cluster. The cluster is
also enhanced with split-lock capabilities to enter one of the redundant modes
with minimum performance loss, allowing execution of a mission-critical or a
performance section, with <400 clock cycles overhead for entry and exit. The
proposed system is the first to integrate these functionalities on an
open-source RISC-V-based compute device, enabling finely tunable reliability
vs. performance trade-offs.
更多查看译文
关键词
modular redundancy approaches,hybrid modular redundancy,clusters,reliable processing,multi-core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要