A Straggler Identification Model for Large-Scale Distributed Computing Systems Using Machine Learning

Proceedings of the 8th International Conference on Advanced Intelligent Systems and Informatics 2022(2022)

引用 0|浏览1
暂无评分
摘要
Nowadays, Large-Scale Distributed Computing Systems has become crucial for storing, processing, and analyzing massive datasets. Apache Spark endorses a general and efficient programming model for large-scale data processing called Resilient Distributed Dataset (RDD). However, the incidence of stragglers is one of the major issues with the Spark cluster. It results in performance deterioration because a task on a system takes abnormal time to finish execution. In this paper, a straggler identification model for distributed environments using machine learning is proposed. This model employs a several spark parameters extracted by the execution of various types and large scale jobs on to assist in identifying the stragglers. In addition, the proposed model applies machine learning approaches to Spark log to learn various kinds of job execution features. The performance of the introduced model is evaluated across various real-world benchmark datasets using default apache spark across diverse CPU, I/O, and mixed workloads. Furthermore, we have empirically shown that Logistic Regression outperforms and can achieve average accuracy of 90% for straggler identification with comparison to other competitive models.
更多
查看译文
关键词
Spark,Distributed systems,Big data,Straggler identification,Machine learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要