Revisiting Exact kNN Query Processing with Probabilistic Data Space Transformations

2018 IEEE International Conference on Big Data (Big Data)(2018)

引用 1|浏览29
暂无评分
摘要
The state-of-the-art approaches for scalable kNN query processing utilise big data parallel/distributed platforms (e.g., Hadoop and Spark) and storage engines (e.g, HDFS, NoSQL, etc.), upon which they build (tree based) indexing methods for efficient query processing. However, as data sizes continue to increase (nowadays it is not uncommon to reach several Petabytes), the storage cost of tree-based index structures becomes exceptionally high. In this work, we propose a novel perspective to organise multivariate (mv) datasets. The main novel idea relies on data space probabilistic transformations and derives a Space Transformation Organisation Structure (STOS) for mv data organisation. STOS facilitates query processing as if underlying datasets were uniformly distributed. This approach bears significant advantages. First, STOS enjoys a minute memory footprint that is many orders of magnitude smaller than indexes in related work. Second, the required memory, unlike related work, increases very slowly with dataset size and, thus, enjoys significantly higher scalability. Third, the STOS structure is relatively efficient to compute, outperforming traditional index building times. The new approach comes bundled with a distributed coordinator-based query processing method so that, overall, lower query processing times are achieved compared to the state-of-the-art index-based methods. We conducted extensive experimentation with real and synthetic datasets of different sizes to substantiate and quantify the performance advantages of our proposal.
更多
查看译文
关键词
tree-based index structures,data space probabilistic transformations,Space Transformation Organisation Structure,mv data organisation,kNN query processing,Big Data parallel platforms,Big Data distributed platforms,distributed coordinator-based query processing,STOS,storage engines
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要