Response of HPC hardware to neutron radiation at the dawn of exascale

JOURNAL OF SUPERCOMPUTING(2023)

引用 0|浏览6
暂无评分
摘要
Every computation presents a small chance that an unexpected phenomenon ruins or modifies its output. Computers are prone to errors that, although may be very unlikely, are hard, expensive or simply impossible to avoid. In the exascale, with thousands of processors involved in a single computation, those errors are especially harmful because they can corrupt or distort the results, wasting human and material resources. In the present work, we study the effect of ionizing radiation on several pieces of commercial hardware, very common in modern supercomputers. Aiming to reproduce the natural radiation that could arise, CPUs (Xeon, EPYC) and GPUs (A100, V100, T4) are subject to a known flux of neutrons coming from two radioactive sources, namely ^252 Cf and ^241 Am-Be, in a special irradiation facility. The working hardware is irradiated under supervision to quantify any appearing error. Once the hardware response is characterised, we are able to scale down the radiation intensity and to estimate the effects on standard data centres. This can help administrators and researchers to develop their contingency plans and protocols.
更多
查看译文
关键词
neutron radiation,hpc hardware,exascale
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要