Efficient Data Access Paths for Mixed Vector-Relational Search
arxiv(2024)
摘要
The rapid growth of machine learning capabilities and the adoption of data
processing methods using vector embeddings sparked a great interest in creating
systems for vector data management. While the predominant approach of vector
data management is to use specialized index structures for fast search over the
entirety of the vector embeddings, once combined with other (meta)data, the
search queries can also become selective on relational attributes - typical for
analytical queries. As using vector indexes differs from traditional relational
data access, we revisit and analyze alternative access paths for efficient
mixed vector-relational search.
We first evaluate the accurate but exhaustive scan-based search and propose
hardware optimizations and alternative tensor-based formulation and batching to
offset the cost. We outline the complex access-path design space, primarily
driven by relational selectivity, and the decisions to consider when selecting
an exhaustive scan-based search against an approximate index-based approach.
Since the vector index primarily avoids expensive computation across the entire
dataset, contrary to the common relational knowledge, it is better to scan at
lower selectivity and probe at higher, with a cross-point between the two
approaches dictated by data dimensionality and the number of concurrent search
queries.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要