Active learning with biased non-response to label requests
CoRR(2023)
摘要
Active learning can improve the efficiency of training prediction models by
identifying the most informative new labels to acquire. However, non-response
to label requests can impact active learning's effectiveness in real-world
contexts. We conceptualise this degradation by considering the type of
non-response present in the data, demonstrating that biased non-response is
particularly detrimental to model performance. We argue that this sort of
non-response is particularly likely in contexts where the labelling process, by
nature, relies on user interactions. To mitigate the impact of biased
non-response, we propose a cost-based correction to the sampling strategy--the
Upper Confidence Bound of the Expected Utility (UCB-EU)--that can, plausibly,
be applied to any active learning algorithm. Through experiments, we
demonstrate that our method successfully reduces the harm from labelling
non-response in many settings. However, we also characterise settings where the
non-response bias in the annotations remains detrimental under UCB-EU for
particular sampling methods and data generating processes. Finally, we evaluate
our method on a real-world dataset from e-commerce platform Taobao. We show
that UCB-EU yields substantial performance improvements to conversion models
that are trained on clicked impressions. Most generally, this research serves
to both better conceptualise the interplay between types of non-response and
model improvements via active learning, and to provide a practical, easy to
implement correction that helps mitigate model degradation.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要