Sample efficient transfer in reinforcement learning for high variable cost environments with an inaccurate source reward model

2022 AMERICAN CONTROL CONFERENCE (ACC)(2022)

引用 2|浏览8
暂无评分
摘要
Here we propose an algorithm that combines two classic ideas, transfer learning and temporal abstraction, to accelerate learning in high variable cost environments (HVC-envs). In an HVC-env, each sampling of the environment incurs a high cost, thus methods to accelerate learning are sought to reduce the incurred cost. Transfer learning can be useful for such environments by using prior knowledge from a source environment. As only a small number of samples can be collected from an HVC-env due to high sampling cost, learning becomes challenging when the source environment provides inaccurate rewards. To overcome this challenge we propose a simple but effective way of creating useful temporally extended actions from an inaccurate physics guided model (PGM) that acts as the source task. At first we address this issue theoretically by providing performance bounds between two semi-Markov Decision Processes (SMDPs) with different reward functions. Later we develop two benchmark HVC-envs where learning must happen using a small number of real samples (often on the order of similar to 10(2) or 10(3)). Finally we show that it is possible to obtain sequential high rewards in both of these environments using similar to 10(3) real samples by leveraging knowledge from PGMs with inaccurate reward models.
更多
查看译文
关键词
HVC-env,reinforcement learning,high variable cost environments,inaccurate source reward model,transfer learning,semiMarkov decision processes,SMDP,physics guided model,PGM,temporal abstraction,autonomous manufacturing systems
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要