Analyzing the Architectural Impact of Transient Fault Effects in SFUs of GPUs.

LATS(2023)

引用 0|浏览7
暂无评分
摘要
(1)Graphics Processing Units (GPUs) are crucial in modern safety-critical systems to implement complex and dense algorithms, so their reliability plays an essential role in several domains (e.g., automotive and autonomous machines). In fact, reliability evaluations in GPUs and their internal units are of special interest by their high parallelism and to identify vulnerable structures. In particular, Special Function Unit (SFU) cores, inside GPUs, are highly used in multimedia, scientific computing, and the training of neural networks. However, reliability evaluations in SFUs have remained highly unexplored. This work evaluates the impact of transient faults in the hardware structures of SFUs for GPUs. We focus on evaluating and analyzing two SFU architectures ('fused' and 'modular') and their relations to energy, area, and reliability impact on GPU workloads. The evaluation resorts to a fine-grain analysis with experiments using an RTL open-source GPU (FlexGripPlus) instrumented with both SFUs. The experimental results on both SFU architectures indicate that modular SFUs are less vulnerable to transient faults (in up to 47% for the analyzed workloads) and are more power efficient (in up to 36.6%) but require additional cost in terms of area (about 27%) in comparison with a fused SFU architecture (base for commercial devices), which seems more vulnerable to faults, but is area efficient.
更多
查看译文
关键词
Graphics Processing Units (GPUs), Reliability evaluation, Special Function unit (SFU), T-Stream core
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要