Chrome Extension
WeChat Mini Program
Use on ChatGLM

Hadoop MapReduce's InputSplit based Indexing for Join Query Processing

2019 9th IEEE International Conference on Control System, Computing and Engineering (ICCSCE)(2019)

Cited 1|Views6
No score
Abstract
Join queries are amongst the most used form of queries, where records from two or more tables or files are retrieved in order to have a comprehensive, comparable and contrasted view of certain data. However, the processing of the join queries come with higher overhead since all the tables or files involved in the process have to be considered. It can easily be imagined how much the overhead could become when data contained in such tables/files is big data. The use of indexing on Hadoop and its abstractions have resulted in improved performance when processing queries. However, even with the use of some of the indexing approaches, the processing of join query indicates higher overhead, except when the amount of data to processed is reduced by the indexing techniques before the query processing even get started. One indexing technique that ensures this, is the InputSplit based index. This paper showcases how InputSplit based indexing can be implemented in Hadoop MapReduce as well the experimental results of running a join query using such index. The results show at least 50% reduction in runtime when compared to both normal Hadoop MapReduce and Clustered Index based on blockIds query processing approaches.
More
Translated text
Key words
component,Hadoop MapReduce,Big Data,Indexing,InputSplit,Join Query
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined