A Step Towards Hadoop Dynamic Scaling

Qiaobin Fu, Nicholas P. Timkovich,Pierre Riteau,Kate Keahey

2018 IEEE 20th International Conference on High Performance Computing and Communications; IEEE 16th International Conference on Smart City; IEEE 4th International Conference on Data Science and Systems (HPCC/SmartCity/DSS)(2018)

引用 6|浏览22
暂无评分
摘要
Many application portals successfully manage to scale elastically in order to provide a stable response time by integrating on-demand cloud resources. This is more challenging for applications that have to manage a dynamic configuration. Our paper investigates the question: under what circumstances (if any) dynamically adding more nodes to the Hadoop computation will result in performance improvement On one hand, if we add more nodes to a Hadoop computation, the computation will potentially finish faster since more computational power will be brought to bear on the problem. On the other hand, ensuring that we can use those nodes effectively may require data redistribution, thus creating additional overhead which may obviate any performance advantages. In this paper, we identified the container allocation as a key factor that affects Hadoop performance. Moreover, to mitigate the overhead, we describe and evaluate three methods for data redistribution in this use case and discuss their advantages and disadvantages.
更多
查看译文
关键词
Hadoop, Dynamic scaling, Geospatial processing, Cloud computing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要