An Information-Theoretic Analysis of Thompson Sampling.

JOURNAL OF MACHINE LEARNING RESEARCH(2016)

引用 402|浏览110
暂无评分
摘要
We provide an information-theoretic analysis of Thompson sampling that applies across a broad range of online optimization problems in which a decision-maker must learn from partial feedback. This analysis inherits the simplicity and elegance of information theory and leads to regret bounds that scale with the entropy of the optimal-action distribution. This strengthens preexisting results and yields new insight into how information improves performance.
更多
查看译文
关键词
Thompson sampling,online optimization,mutli-armed bandit,information theory,regret bounds
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要