A Review On Data Locality In Hadoop Mapreduce

2018 FIFTH INTERNATIONAL CONFERENCE ON PARALLEL, DISTRIBUTED AND GRID COMPUTING (IEEE PDGC)（2018）

引用 4|浏览7

暂无评分

摘要

MapReduce has emerged as a strong model for processing parallel and distributed data for huge datasets. Hadoop an open source implementation of MapReduce has approved MapReduce widely. Hadoop fragments the input file into number of data blocks to allocate them to various DataNodes in cluster. Hadoop must provide effective scheduling to process these data blocks in efficient way. One of the issues that play vital role in efficient processing of MapReduce is Data Locality which is caused due to overhead of network. Data locality is equipped for moving the computation adjacent to the data where it dwells. It is a key resource in distributed environment which influences the tasks accomplishing time. The issues which troubles data locality are cluster and network load, resource sharing, cluster environment, size of data blocks, number of mappers and reducers. This paper aims to review various algorithms that are aware of data locality in scheduling, along with their strengths and weaknesses.

查看译文

关键词

MapReduce, Hadoop, HDFS, Scheduling, Data locality

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要