Wikimarks: Harvesting Relevance Benchmarks from Wikipedia

SIGIR '22: Proceedings of the 45th International ACM SIGIR Conference on Research and Development in Information Retrieval(2022)

引用 6|浏览29
暂无评分
摘要
We provide a resource for automatically harvesting relevance benchmarks from Wikipedia -- which we refer to as "Wikimarks" to differentiate them from manually created benchmarks. Unlike simulated benchmarks, they are based on manual annotations of Wikipedia authors. Studies on the TREC Complex Answer Retrieval track demonstrated that leaderboards under Wikimarks and manually annotated benchmarks are very similar. Because of their availability, Wikimarks can fill an important need for Information Retrieval research. We provide a meta-resource to harvest Wikimarks for several information retrieval tasks across different languages: paragraph retrieval, entity ranking, query-specific clustering, outline prediction, and relevant entity linking and many more. In addition, we provide example Wikimarks for English, Simple English, and Japanese derived from the 01/01/2022 Wikipedia dump. Resource available: https://trema-unh.github.io/wikimarks/
更多
查看译文
关键词
test collections, relevant entity linking, query-specific clustering
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要