Improved Input Data Splitting in MapReduce

J. Tan, S. Meng, X. Meng, R. Vernica, A. Balmin, K. S. Beyer,Chunguang Wang,Qingbo Wu, Yusong Tan, Wenzhu Wang,Quanyuan Wu

semanticscholar（2020）

引用 0|浏览1

暂无评分

摘要

The performance of MapReduce greatly depends on its data splitting process which happens before the map phase. This is usually done using naive methods which are not at all optimal. In this paper, an Improved Input Splitting technology based on locality is explained which aims at addressing the input data splitting problems which affects the job performance seriously. Improved Input Splitting clusters data blocks from a same node into the same single partition, so that it is processed by one map task. This method avoids the time for slot reallocation and multiple tasks initializing. Experiment results demonstrated that this can improve the MapReduce processing performance largely than the traditional Hadoop implementation.

查看译文

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要