Finding data broadness via generalized nearest neighbors

Jayendra Venkateswaran,Tamer Kahveci,Orhan Camoglu

ADVANCES IN DATABASE TECHNOLOGY - EDBT 2006（2006）

引用 3|浏览0

暂无评分

摘要

A data object is broad if it is one of the k-Nearest Neighbors (k-NN) of many data objects. We introduce a new database primitive called Generalized Nearest Neighbor (GNN) to express data broadness. We also develop three strategies to answer GNN queries efficiently for large datasets of multidimensional objects. The R*-Tree based search algorithm generates candidate pages and ranks them based on their distances. Our first algorithm, Fetch All (FA), fetches as many candidate pages as possible. Our second algorithm, Fetch One (FO), fetches one candidate page at a time. Our third algorithm, Fetch Dynamic (FD), dynamically decides on the number of pages that needs to be fetched. We also propose three optimizations, Column Filter, Row Filter and Adaptive Filter, to eliminate pages from each dataset. Column Filter prunes the pages that are guaranteed to be non-broad. Row Filter prunes the pages whose removal do not change the broadness of any data point. Adaptive Filter prunes the search space dynamically along each dimension to eliminate unpromising objects. Our experiments show that FA is the fastest when the buffer size is large and FO is the fastest when the buffer size is small. FD is always either fastest or very close to the faster of FA and FO. FD is significantly faster than the existing methods adapted to the GNN problem.

查看译文

关键词

row filter,generalized nearest neighbor,fetch dynamic,data broadness,search algorithm,data point,data object,candidate page,buffer size,adaptive filter,column filter,search space,nearest neighbor,k nearest neighbor

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要