Analyzing Network Health and Congestion in Dragonfly-Based Supercomputers

2016 IEEE International Parallel and Distributed Processing Symposium (IPDPS)(2016)

引用 69|浏览80
暂无评分
摘要
The dragonfly topology is a popular choice for building high-radix, low-diameter, hierarchical networks with high-bandwidth links. On Cray installations of the dragonfly network, job placement policies and routing inefficiencies can lead to significant network congestion for a single job and multi-job workloads. In this paper, we explore the effects of job placement, parallel workloads and network configurations on network health to develop a better understanding of inter-job interference. We have developed a functional network simulator, Damselfly, to model the network behavior of Cray Cascade, and a visual analytics tool, DragonView, to analyze the simulation output. We simulate several parallel workloads based on five representative communication patterns on up to 131,072 cores. Our simulations and visualizations provide unique insight into the buildup of network congestion and present a trade-off between deployment dollar costs and performance of the network.
更多
查看译文
关键词
dragonfly network,congestion,inter-job interference,simulation,visual analytics
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要