Analysing the influence of memory and workload on the reliability of GPUs under neutron radiation

IEEE Transactions on Nuclear Science(2024)

引用 0|浏览0
暂无评分
摘要
Evaluating the impact of utilising different GPU resources is crucial for gaining insights into the reliability of GPUs when exposed to radiation. In this study, we employed various versions of a microbenchmark to investigate the effect of different memory types on the performance of a low-power GPU integrated into the TX1 SoC of a Jetson Nano board. Additionally, we explored the trade-off between enhanced computational performance and the occurrence of failures over time by optimising the utilisation of GPU resources. Our findings demonstrate that maximising the utilisation of the device’s cores enables the completion of a greater number of computations without errors. By fully harnessing the computational potential of the GPU cores, we effectively increase the work that we can complete between failures. Moreover, we observed that the use of the different memory types has a significant influence on the overall reliability of the GPU. The outcomes of this research contribute to a comprehensive understanding of the interplay between GPU resources, irradiation effects, and reliability. This knowledge is instrumental in guiding the development of robust GPUs for applications in radiation-prone environments.
更多
查看译文
关键词
Fault tolerance,GPU,microbenchmark,neutron,radiation,soft error
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要