Effective Straggler Mitigation with Cross-Layer Interference-Aware Optimization

2019 IEEE 25th International Conference on Parallel and Distributed Systems (ICPADS)(2019)

Cited 2|Views16
No score
Abstract
In-memory data processing frameworks (e.g., Spark) make big data analysis greatly simpler and efficient. However, stragglers that take much longer to finish than other tasks significantly degrade performance. There exist multiple factors that cause stragglers, either from the hardware resource layer or application layer, e.g. hardware heterogeneity, interference, data locality and data skew. While state-of-the-art straggler mitigation techniques have presented partial solutions on data skew and data locality, we find that the other factors can also result in serious problems. We present Clio, a cross-layer interference-aware optimization system that can effectively mitigate stragglers for data processing frameworks. Clio supports the scheduling of both map and reduce tasks. It heuristically dispatches intermediate data in proportion to the actual computing ability of each worker node, which is estimated considering various straggler factors, to balance the completion times of tasks in a much finer way. We implement Clio in Apache Spark, and evaluate its performance using both synthetic and real datasets. Experiment results show that, Clio can speed up the execution of applications by up to 67%, compared with the existing algorithms.
More
Translated text
Key words
Straggler mitigation,Spark,scheduling,key partitioning
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined