Offline A/B Testing for Recommender Systems.

WSDM 2018: The Eleventh ACM International Conference on Web Search and Data Mining Marina Del Rey CA USA February, 2018(2018)

引用 219|浏览499
暂无评分
摘要
Online A/B testing evaluates the impact of a new technology by running it in a real production environment and testing its performance on a subset of the users of the platform. It is a well-known practice to run a preliminary offline evaluation on historical data to iterate faster on new ideas, and to detect poor policies in order to avoid losing money or breaking the system. For such offline evaluations, we are interested in methods that can compute offline an estimate of the potential uplift of performance generated by a new technology. Offline performance can be measured using estimators known as counterfactual or off-policy estimators. Traditional counterfactual estimators, such as capped importance sampling or normalised importance sampling, exhibit unsatisfying bias-variance compromises when experimenting on personalized product recommendation systems. To overcome this issue, we model the bias incurred by these estimators rather than bound it in the worst case, which leads us to propose a new counterfactual estimator. We provide a benchmark of the different estimators showing their correlation with business metrics observed by running online A/B tests on a large-scale commercial recommender system.
更多
查看译文
关键词
counterfactual estimation, off-policy evaluation, recommender system, importance sampling
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要