Cost-aware Resource Recommendation for DAG-based Big Data Workflows: An Apache Spark Case Study

IEEE Transactions on Services Computing(2022)

引用 0|浏览10
暂无评分
摘要
The era of personal resources being sufficient for enterprise big data computations has passed. As computations are executed in the cloud, small policy changes of cloud operators may cause considerable changes in operational costs. Carefully choosing the amount of resources for a given application is thus of great importance. This, however, requires a priori knowledge of the application's performance under different configurations. Creating a performance prediction model needs to account for the heterogeneity of resources and the diversity in application workflows. Previous approaches for heterogeneous environments consider a black-box representation of the application which results in single-purpose models. This paper addresses the problem with two gray-box prediction models using linear programming (LP) and mixed-integer linear programming (MILP). Given a set of available resources, the models consider Apache Spark applications and their Directed Acyclic Graph (DAG) of workflow running on top of a Hadoop-YARN cluster. We then propose a configuration recommendation algorithm to optimize the cost-performance trade-offs when renting machine instances. The accuracy of the proposed models is evaluated with real-world executions of several representative applications on the Wikipedia dataset and the TPC-DS benchmark. The average error of only 3.28% for the proposed prediction models demonstrates the practicality of the proposed approach in handling cost-performance trade-offs.
更多
查看译文
关键词
Big Data, Cluster computing, Predictive models, Costs, Task analysis, Runtime, Sparks, Apache spark, big data frameworks, performance evaluation, resource recommendation, cost model
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要