An Efficient Corpus Indexer for dynamic corpora retrieval
Expert Systems with Applications(2024)
Abstract
As a new paradigm for information retrieval, generative retrieval (GR) has achieved solid performance on various retrieval tasks. Despite its promising progress, this line of research cannot generalize on a dynamic corpora, where new documents are continually added to it. There are already some continual learning-based pioneering works focusing on this issue, yet the continual learning framework requires retraining after model deployment and may suffer from catastrophic forgetting issues. Hence, we propose a new retrieval framework noted as ECI (an Efficient Corpus Indexer for dynamic corpora retrieval). The ECI is a hybrid index framework containing generative and deep hashing indexes. We design a complementary training objective noted as Prefix-Sensitive Similarity Alignment, which can further improve the performance of generative retrieval. Besides, ECI enables incremental deep hashing learning and provides a deep hashing index-based retrieval scheme for new documents, thus solving the generalization problem on dynamic corpora. Furthermore, ECI utilizes techniques like whitening and query-generated data augmentation to enhance retrieval performance. In a dynamic corpus retrieval task built on the commonly used academic benchmark Natural Question, the ECI outperforms various baselines, including the state-of-the-art GR baseline and its variants.
MoreTranslated text
Key words
Document retrieval,Generative retrieval,Dynamic corpora,Deep hashing
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined