On the effect of relevance scales in crowdsourcing relevance assessments for Information Retrieval evaluation

Kevin Roitero,Eddy Maddalena,Stefano Mizzaro,Falk Scholer

Information Processing & Management（2021）

引用 9|浏览15

暂无评分

摘要

Relevance is a key concept in information retrieval and widely used for the evaluation of search systems using test collections. We present a comprehensive study of the effect of the choice of relevance scales on the evaluation of information retrieval systems. Our work analyzes and compares four crowdsourced scales (2-levels, 4-levels, and 100-levels ordinal scales, and a magnitude estimation scale) and two expert-labeled datasets (on 2- and 4-levels ordinal scales). We compare the scales considering internal and external agreement, the effect on IR evaluation both in terms of system effectiveness and topic ease, and we discuss the effect of such scales and datasets on the perception of relevance levels by assessors.

查看译文

关键词

Relevance scales,Crowdsourcing,Information Retrieval evaluation,Relevance assessment

AI 理解论文

溯源树

样例

生成溯源树，研究论文发展脉络

Chat Paper

正在生成论文摘要