Quantum Speedups in Regret Analysis of Infinite Horizon Average-Reward Markov Decision Processes
arxiv(2023)
摘要
This paper investigates the potential of quantum acceleration in addressing
infinite horizon Markov Decision Processes (MDPs) to enhance average reward
outcomes. We introduce an innovative quantum framework for the agent's
engagement with an unknown MDP, extending the conventional interaction
paradigm. Our approach involves the design of an optimism-driven tabular
Reinforcement Learning algorithm that harnesses quantum signals acquired by the
agent through efficient quantum mean estimation techniques. Through thorough
theoretical analysis, we demonstrate that the quantum advantage in mean
estimation leads to exponential advancements in regret guarantees for infinite
horizon Reinforcement Learning. Specifically, the proposed Quantum algorithm
achieves a regret bound of 𝒪̃(1), a significant improvement
over the 𝒪̃(√(T)) bound exhibited by classical
counterparts.
更多查看译文
关键词
reinforcement
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要