Parallel computation of k-nearest neighbor joins using MapReduce

2016 IEEE International Conference on Big Data (Big Data)(2016)

引用 23|浏览59
暂无评分
摘要
The k-nearest neighbor (kNN) join has recently attracted considerable attention due to its broad applications. However, processing fcNN joins is very expensive due to the quadratic nature of the join operation. Furthermore, since there is an increasing trend of applications to deal with big data, computing fcNN joins becomes more challenging. In order to process such big data, parallel and distributed computing using MapReduce recently have received a lot of attention. In this paper, we propose the efficient parallel algorithm KNN-MR to process the fcNN joins using MapReduce. To reduce not only the computational cost of fcNN joins but also the network cost of communicating across machines, we develop the novel vector projection pruning which enables us to identify non-fcNN points that are guaranteed not to be included in the result of a fcNN join. Our performance study confirms the effectiveness and scalability of the proposed algorithm.
更多
查看译文
关键词
kNN joins,MapReduce,Hadoop
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要