Making mapreduce scheduling effective in erasure-coded storage clusters

LANMAN(2015)

引用 1|浏览32
暂无评分
摘要
With the explosive growth of data, enterprises increasingly adopt erasure coding on storage clusters to save storage space. On the other hand, erasure coding incurs higher performance overhead, especially during recovery. This motivates us to study the feasibility of alleviating performance overhead of erasure coding, while maintaining its storage efficiency advantage. In this paper, we study the performance issue of MapReduce when it runs on erasure-coded storage. We first review our previously proposed degraded-first scheduling, which avoids network bandwidth competition among degraded map tasks in failure mode, and hence improves the MapReduce performance over the default locality-first scheduling in MapReduce. We then show that the basic degraded-first scheduling may not work effectively when there are multiple running MapReduce jobs, and hence we propose heuristics to enhance the degraded-first scheduling design. Simulations demonstrate the performance gain of our enhanced degraded-first scheduling in a multi-job scenario. Our work makes a case that a new design of MapReduce scheduling is critical when we move to erasure-coded storage.
更多
查看译文
关键词
Erasure coding, MapReduce, storage systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要