Chic-sched: a HPC Placement-Group Scheduler on Hierarchical Topologies with Constraints.

IPDPS(2023)

引用 0|浏览16
暂无评分
摘要
Efficient placement of advanced HPC and AI workloads with application constraints is raising challenges for resource schedulers on shared infrastructures, such as the Cloud. In this work, we propose a novel Constraints- and Heuristics-based scheduler on HIerarchical Topologies for High-Performance Computing workloads in the Cloud (chic-sched, for short). Our heuristics-based algorithm enables placement across multiple levels in a network hierarchy with loosely specified constraints, and it works without retries by providing suboptimal placements to minimize placement failures. This allows for fast scheduling at scale, and the O(N logN) complexity enables placement decisions within tens of milliseconds for groups of hundreds of virtual machines (VM). We introduce a new and simple metric to quantify the goodness of group placements. With this metric, in terms of deviation from ideal placements, we show that chic-sched is 20-50% better than the common bestFit or worstFit algorithms in all scenarios of two-level placements with spreading and packing constraints. We evaluate chic-sched with publicly available VM-request traces from a production Cloud, and, comparing against bestFit, we show that it achieves 8% lower placement failure rates and more than 40% better placement locality. Finally, to quantify the goodness of constraints-based placements, we conduct experiments with a realistic MPI workload on synthetically allocated VM clusters in a public cloud. We measure a 9% performance improvement over an adverse placement in a scenario where our heuristics-based scheduler would return a good, but not perfect, placement.
更多
查看译文
关键词
Scheduling,Placement Groups,HPC in Cloud
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要