Upper Limit Analysis Of Scalable Parallel Computing On The Premise Of Reliability Requirement

IETE TECHNICAL REVIEW(2016)

引用 0|浏览1
暂无评分
摘要
The Top500 supercomputers ranking has been held twice a year according to Linpack performance for more than 20years, which greatly stimulates the development of high-performance computing. However, it is still not clear how to determine the scale limit of supercomputers. It will undoubtedly cause a waste of resources if we build bigger and bigger supercomputers without caring about other aspects of cost, energy, reliability. Thus, this paper analyses the scalability and scale limit for parallel computing with a reliability requirement. We use a Markov chain to model the state transition process of a parallel computing system, so the probability of parallel tasks running on machines successfully can be evaluated, that is the reliability of parallel computing. When parallel computing carries out an iso-speed efficiency extension under specific reliability requirements, we present an approach to calculate the maximum number of processing nodes and the maximum workload of parallel tasks, which actually reveals the function relation between the scale limit and the speed efficiency of parallel computing. Taking Tianhe-2, which is the current No. 1 supercomputer, as an example, we utilize our methods to do a case study and predict its scale limit. Finally, a simulation experiment is conducted to verify our theory.
更多
查看译文
关键词
Markov chain, Parallel computing, Reliability, Scalability, Scale limit analysis
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要