Resilience Analysis of Top K Selection Algorithms

2017 13th European Dependable Computing Conference (EDCC)(2017)

引用 0|浏览38
暂无评分
摘要
As the number of components in high-performance computing (HPC) systems continues to grow, the number of vehicles for soft errors will rise in parallel. Petascale research has shown that soft errors on supercomputers can occur as frequently as multiple times per day, and this rate will only increase with the exascale era. Due to this frequency, the resilience community has taken an interest in algorithmic resilience as a means for reliable computing in faulty environments. Probabilistic algorithms in particular have generated interest, due to their imprecise nature and ability to handle incorrect guesses. In this paper, we analyze the intrinsic resilience of a probabilistic Top K selection algorithm to silent data corruption in the event of a single event upset. We introduce a new paradigm of analytically quantifying an algorithm's resilience as a function of its inputs, which permits a precise comparison of the resilience of competing algorithms. In addition, we discuss the implications of our findings on the resilience of probabilistic algorithms as a whole in comparison to their deterministic counterparts.
更多
查看译文
关键词
fault tolerance,algorithmic resilience,soft error injection
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要