A Runtime Heuristic to Selectively Replicate Tasks for Application-Specific Reliability Targets

2016 IEEE International Conference on Cluster Computing (CLUSTER)(2016)

引用 12|浏览47
暂无评分
摘要
In this paper we propose a runtime-based selective task replication technique for task-parallel high performance computing applications. Our selective task replication technique is automatic and does not require modification/recompilation of OS, compiler or application code. Our heuristic, we call App_FIT, selects tasks to replicate such that the specified reliability target for an application is achieved. In our experimental evaluation, we show that App FIT selective replication heuristic is low-overhead and highly scalable. In addition, results indicate that complete task replication is overkill for achieving reliability targets. We show that with App FIT, we can tolerate pessimistic exascale error rates with only 53% of the tasks being replicated.
更多
查看译文
关键词
Selective replication,HPC and exascale computing,task parallelism,dataflow programming
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要