An Efficient Corpus Indexer for dynamic corpora retrieval

Ao Zou,Wenning Hao,Dawei Jin, Shichen Zou, Yu Zheng,Feiyan Sun, Li Xiang

Expert Systems with Applications(2024)

Cited 0|Views0
No score
Abstract
As a new paradigm for information retrieval, generative retrieval (GR) has achieved solid performance on various retrieval tasks. Despite its promising progress, this line of research cannot generalize on a dynamic corpora, where new documents are continually added to it. There are already some continual learning-based pioneering works focusing on this issue, yet the continual learning framework requires retraining after model deployment and may suffer from catastrophic forgetting issues. Hence, we propose a new retrieval framework noted as ECI (an Efficient Corpus Indexer for dynamic corpora retrieval). The ECI is a hybrid index framework containing generative and deep hashing indexes. We design a complementary training objective noted as Prefix-Sensitive Similarity Alignment, which can further improve the performance of generative retrieval. Besides, ECI enables incremental deep hashing learning and provides a deep hashing index-based retrieval scheme for new documents, thus solving the generalization problem on dynamic corpora. Furthermore, ECI utilizes techniques like whitening and query-generated data augmentation to enhance retrieval performance. In a dynamic corpus retrieval task built on the commonly used academic benchmark Natural Question, the ECI outperforms various baselines, including the state-of-the-art GR baseline and its variants.
More
Translated text
Key words
Document retrieval,Generative retrieval,Dynamic corpora,Deep hashing
AI Read Science
Must-Reading Tree
Example
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined