Dynamic Reconfiguration of Data Parallel Programs

2016 28th International Symposium on Computer Architecture and High Performance Computing (SBAC-PAD)(2016)

引用 4|浏览37
暂无评分
摘要
Given the large amount of data from different sources that have become available to researchers in multiple fields, Data Science has emerged as a new paradigm for exploring and getting value from that data. In that context, new parallel processing environments with abstract programming interfaces, like Spark, were proposed to try to simplify the development of distributed programs. Although such solutions have become widely used, achieving the best performance with them is still not always straight-forward, despite the multiple run-time strategies they use. In this work we analyze some of the causes of performance degradation in such systems and, based on that analysis, we propose a tool to improve performance by dynamically adjusting data partitioning and parallelism degree in recurrent applications based on previous executions. Our results applying that methodology show consistent reductions in execution time for the applications considered, with gains of up to 50%.
更多
查看译文
关键词
data parallel,spark,performance,framework,adaptive execution
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要