Rock you like a hurricane: taming skew in large scale analytics.

EUROSYS '18: PROCEEDINGS OF THE THIRTEENTH EUROSYS CONFERENCE(2018)

引用 31|浏览140
暂无评分
摘要
Current cluster computing frameworks suffer from load imbalance and limited parallelism due to skewed data distributions, processing times, and machine speeds. We observe that the underlying cause for these issues in current systems is that they partition work statically. Hurricane is a high-performance large-scale data analytics system that successfully tames skew in novel ways. Hurricane performs adaptive work partitioning based on load observed by nodes at run-time. Overloaded nodes can spawn clones of their tasks at any point during their execution, with each clone processing a subset of the original data. This allows the system to adapt to load imbalance and dynamically adjust task parallelism to gracefully handle skew. We support this design by spreading data across all nodes and allowing nodes to retrieve data in a decentralized way. The result is that Hurricane automatically balances load across tasks, ensuring fast completion times. We evaluate Hurricane's performance on typical analytics workloads and show that it significantly outperforms state-of-the-art systems for both uniform and skewed datasets, because it ensures good CPU and storage utilization in all cases.
更多
查看译文
关键词
Hurricane,big data,analytics,cluster computing,skew,high performance,task cloning,adaptive work partitioning,merging,repartitioning,load balancing,storage disaggregation,decentralized storage,bags,chunks,fine-grained partitioning,distributed scheduling,batch sampling,late binding
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要