Disagreement-based combinatorial pure exploration: Efficient algorithms and an analysis with localization.

arXiv: Machine Learning(2017)

引用 24|浏览21
暂无评分
摘要
We design new algorithms for the combinatorial pure exploration problem the multi-arm bandit framework. In this problem, we are given K distributions and a collection of subsets $mathcal{V} 2^K$ of these distributions, and we would like to find the subset $v in mathcal{V}$ that has largest cumulative mean, while collecting, a sequential fashion, as few samples from the distributions as possible. We study both the fixed budget and fixed confidence settings, and our algorithms essentially achieve state-of-the-art performance all settings, improving on previous guarantees for structures like matchings and submatrices that have large augmenting sets. Moreover, our algorithms can be implemented efficiently whenever the decision set V admits linear optimization. Our analysis involves precise concentration-of-measure arguments and a new algorithm for linear programming with exponentially many constraints.
更多
查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要