A Network Load Perception Based Task Scheduler for Parallel Distributed Data Processing Systems

Zhuo Tang, Zhanfei Xiao,Li Yang, Kailin He,Kenli Li

IEEE Transactions on Cloud Computing(2023)

引用 0|浏览5
暂无评分
摘要
In parallel distributed data processing frameworks like Spark and Flink, task scheduling has a great impact on cluster performance. Though task Scheduling has proven to be an NP-complete problem, a large number of researchers have proposed many heuristic rules to obtain approximate optimal solutions. But most of them ignore the fact that the resource requirements of tasks are dynamically changing during its runtime. Considering the overall task entire lives, the CPU utilization is often lower during the data transfer. Especially for most distributed data processing platforms, data transmission is time-consuming, which usually resulting in low overall CPU utilization. Similarly, network throughput during task calculations is also low in some cases. In this article, we propose a network load variation perception based heuristic task scheduling algorithm, and based on this implement a dual-phase pipeline task scheduler (D2PTS) from the perspective of dynamic resource requirements that aims at maximizing cluster resource utilization, as a supplement to existing data-parallel frameworks. D2PTS divides the states of task into two phases: network-intensive and network-free. To improve the overall resource utilities, this article proposes different algorithms to evaluate the execution time of network sensitive and network free phases respectively. When an executing task is in the network-free phase, D2PTS can additionally schedule a new network-intensive task at the right time. Under this scheduling policy, the two tasks sharing the same CPU core can be executed as a coarse-grained pipeline. This execution method can start tasks earlier and improve resource utilization. Finally, we have implemented our model prototype on Spark 2.4.3 and conducted a number of experiments to evaluate the performance of our model. Experimental results show that D2PTS can not only minimize application makespan, but also improve resource utilization.
更多
查看译文
关键词
Dynamic resource requirements,dual-phase pipeline,Spark,task scheduling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要