Harvesting for full-text retrieval

DIGITAL LIBRARIES: IMPLEMENTING STRATEGIES AND SHARING EXPERIENCES, PROCEEDINGS(2005)

引用 0|浏览0
暂无评分
摘要
We propose an approach to Distributed Information Retrieval based on the periodic and incremental centralisation of full-text indices of widely dispersed and autonomously managed content sources. Inspired by the success of the Open Archive Initiative's protocol for metadata harvesting, the approach occupies middle ground between: (i) the crawling of content, and (ii) the distribution of retrieval. As in crawling, some data moves towards the retrieval process, but it is statistics about the content rather than content itself. As in distributed retrieval, some processing is distributed along with the data, but it is indexing rather than retrieval itself. We show that the approach retains the good properties of centralised retrieval without renouncing to cost-effective resource pooling. We discuss the requirements associated with the approach and identify two strategies to deploy it on top of the OAI infrastructure.
更多
查看译文
关键词
full-text retrieval,open archive initiative,data move,content source,incremental centralisation,full-text index,centralised retrieval,retrieval process,information retrieval,oai infrastructure,good property,indexation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要