Semantic Search Pipeline: From Query Expansion To Concept Forging

Elizabeth Soper,Jordan Hosier, Dustin Bales,Vijay K. Gurbani

2021 IEEE 37TH INTERNATIONAL CONFERENCE ON DATA ENGINEERING (ICDE 2021)(2021)

引用 1|浏览2
暂无评分
摘要
When searching a database for a topic (e.g. Covid-19), there may not exist a precise match, especially if the topic is novel. Furthermore, the topic may surface in the data under different guises (`Covid-19,"coronavirus,"pandemic', etc.). The results of a keyword search are limited by the querier's imagination and familiarity with the data. Such searches have high precision, but low recall. In order to increase the recall of searches, we present the Semantic Search Pipeline, a novel approach to document retrieval that uses distributional semantic models and locality sensitive hashing to expand queries and efficiently identify other relevant documents that may not contain the obvious query terms. We evaluate the pipeline using a dataset curated from financial customer service call centers, resulting in an increase in recall of 32% over a simple keyword baseline, with a negligible drop in precision. Furthermore, we present the notion of concept forging, a process of tracing a topic or concept through time and through its various surface realizations. Applied to Covid-19, the search pipeline retrieves a set of documents that allow us to uncover the short- and long-term effects of Covid-19 on the lives of the people and businesses impacted by it. Although Covid-19 is a timely test case, our search pipeline is general in nature and can be easily applied to any range of topics.
更多
查看译文
关键词
Information Retrieval,Natural Language Processing
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要