Efficient No-Regret Multiagent Learning.

AAAI'05: Proceedings of the 20th national conference on Artificial intelligence - Volume 1(2005)

引用 27|浏览6
暂无评分
摘要
We present new results on the efficiency of no-regret algorithmsin the context of multiagent learning. We use a known approach to augment a large class of no-regret algorithms to allow stochastic sampling of actions and observation of scalar reward of only the action played. We show that the average actual payoffs of the resulting learner gets (1) close to the best response against (eventually) stationary opponents. (2) close to the asymptotic optimal payoff against opponents that playa converging sequence of policies. and (3) close to at least a dynamic variant of minimax payoff against arbitrary opponents. with a high probability in polynomial time. In addition the polynomial bounds are shown to be significantly better than previously known bounds. Furthermore, we do not need to assume that the learner knows the game matrices and can observe the opponents' actions, unlike previous work.
更多
查看译文
关键词
asymptotic optimal payoff,average actual payoff,known approach,minimax payoff,no-regret algorithm,no-regret algorithmsin,polynomial bound,polynomial time,resulting learner,arbitrary opponent,efficient no-regret multiagent learning
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要