Curriculum Goal-Conditioned Imitation for Offline Reinforcement Learning

IEEE TRANSACTIONS ON GAMES(2024)

引用 0|浏览18
暂无评分
摘要
Offline reinforcement learning (RL) enables learning policies from precollected datasets without online data collection. Although it offers the possibility to surpass the performance of the datasets, most existing offline RL algorithms struggle to compete with behavior cloning policies in many dataset settings due to trading off policy improvement and additional regularization to address the distributional shift issue. In many cases, if one can imitate a sequence of suboptimal subtrajectories in data and properly "stitch" them toward reaching an ideal future state, it may potentially result in a more reliable policy while avoiding difficulties that present in typical value-based offline RL algorithms. We borrow the idea of curriculum learning to embody the above intuition. We construct a curriculum that progressively imitates a sequence of suboptimal trajectories conditioned on a series of carefully constructed future states and cumulative rewards as goals. The suboptimal trajectories gradually guide policy learning toward reaching the ideal goal states. We name our algorithm curriculum goal-conditioned imitation (CGI). Experimental results show that CGI achieves competitive performance against state-of-the-art offline RL algorithms, especially for challenging tasks with long horizons and sparse rewards.
更多
查看译文
关键词
Curriculum learning,goal-conditioned policy,offline reinforcement learning,robotic games,self-imitation
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要