Which noise affects algorithm robustness for learning to rank

Information Retrieval Journal(2015)

引用 6|浏览1
暂无评分
摘要
When applying learning to rank algorithms in real search applications, noise in human labeled training data becomes an inevitable problem which will affect the performance of the algorithms. Previous work mainly focused on studying how noise affects ranking algorithms and how to design robust ranking algorithms. In our work, we investigate what inherent characteristics make training data robust to label noise and how to utilize them to guide labeling. The motivation of our work comes from an interesting observation that a same ranking algorithm may show very different sensitivities to label noise over different data sets. We thus investigate the underlying reason for this observation based on three typical kinds of learning to rank algorithms (i.e. pointwise, pairwise and listwise methods) and three public data sets (i.e. OHSUMED, TD2003 and MSLR-WEB10K) with different properties. We find that when label noise increases in training data, it is the document pair noise ratio (referred to as pNoise ) rather than document noise ratio (referred to as dNoise ) that can well explain the performance degradation of a ranking algorithm. We further identify two inherent characteristics of the training data, namely relevance levels and label balance , that have great impact on the variation of pNoise with respect to label noise (i.e. dNoise). According to these above results, we further discuss some guidelines on the labeling strategy to construct robust training data for learning to rank algorithms in practice.
更多
查看译文
关键词
Learning to rank,Label noise,Robust data
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要