Pyramid: A General Framework For Distributed Similarity Search On Large-Scale Datasets

Shiyuan Deng,Xiao Yan,Kelvin K. W. Ng,Chenyu Jiang,James Cheng

2019 IEEE INTERNATIONAL CONFERENCE ON BIG DATA (BIG DATA)（2019）

引用 7|浏览6

暂无评分

摘要

Similarity search is a core component in various applications such as image matching and product recommendation. However, single-machine solutions are usually insufficient due to the large cardinality of modern datasets. We present Pyramid, a general and efficient framework for distributed similarity search. Pyramid supports search with popular similarity functions including Euclidean distance, angular distance and inner product. Different from existing distributed solutions that are based on KD-tree or locality sensitive hashing (LSH), Pyramid is based on the Hierarchical Navigable Small World graph (HNSW), which is the state-of-the-art similarity search algorithm. To achieve high query processing throughput, Pyramid partitions a dataset into sub-datasets containing similar items for index building and assigns a query to only some of the sub-datasets for query processing. Experiments on large-scale datasets show that Pyramid produces quality results for similarity search, achieves high query processing throughput and low latency, and is robust under node failure and straggler.

查看译文

关键词

similarity search,distributed system

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要