Large-scale response-aware online ANN search in dynamic datasets

CLUSTER COMPUTING-THE JOURNAL OF NETWORKS SOFTWARE TOOLS AND APPLICATIONS(2023)

引用 0|浏览6
暂无评分
摘要
Similarity search is a key operation in content-based multimedia retrieval (CBMR) applications. Online CBMR applications, which are the focus of this work, perform a large number of search operations on dynamic datasets, which are updated at run-time. Additionally, the rates of search and data insertion (updated) operations vary during the execution. Such applications that rely on similarity search are required to fulfill these demands while also offering low response times. Thus, it is common for the computing demands in such applications to exceed the processing power of a single computer, motivating the usage of large-scale compute systems. As such, we propose in this work a distributed memory parallelization of similarity search that addresses these challenges. Our solution employs the efficient Inverted File System with Asymmetric Distance Computation algorithm (IVFADC) as the baseline, which is extended to support dynamic datasets. A dynamic resource management algorithm, called Multi-Stream Adaptation (MS-ADAPT) is proposed. It allows run-time changes on resource assignment with the goal of reducing response times. We evaluate our solution with multiple data partitioning strategies using up to 160 compute nodes and a dataset with 344 billion multimedia descriptors. Our experiments demonstrate superlinear scalability and MS-ADAPT outperforms the best static approach (oracle) by improving the response times up to 32× on high-load cases.
更多
查看译文
关键词
dynamic datasets,search,large-scale large-scale,response-aware
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要