Expedited Online Learning With Spatial Side Information

IEEE Transactions on Automatic Control(2023)

引用 0|浏览11
暂无评分
摘要
The applicability of model-based online reinforcement learning algorithms is often limited by the amount of exploration required for learning the environment model to the desired level of accuracy. A promising approach to addressing this issue is to exploit side information, available either a priori or during the agent’s mission, for learning the unknown dynamics. Side information in our context refers to information in the form of bounds on the differences between transition probabilities at different states in the environment. We use this information as a measure of reusability of the direct experience gained by performing actions and observing the outcomes at different states. We propose a framework to integrate side information into existing model-based reinforcement learning algorithms by complementing the samples obtained directly at states with second-hand information obtained from other states with similar dynamics. Additionally, we propose an algorithm for synthesizing the optimal control strategy in unknown environments by using side information to effectively balance between exploration and exploitation. We prove that, with high probability, the proposed algorithm yields a near-optimal policy in the Bayesian sense, while also guaranteeing the safety of the agent during exploration. We obtain the near-optimal policy in time steps that are polynomial in terms of the parameters describing the model. We illustrate the utility of the proposed algorithms in a setting of a Mars rover, with data from onboard sensors and a companion aerial vehicle acting as the side information.
更多
查看译文
关键词
Markov decision processes (MDPs),online learning,planning,side information
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要