Continual Offline Reinforcement Learning via Diffusion-based Dual Generative Replay
arxiv(2024)
摘要
We study continual offline reinforcement learning, a practical paradigm that
facilitates forward transfer and mitigates catastrophic forgetting to tackle
sequential offline tasks. We propose a dual generative replay framework that
retains previous knowledge by concurrent replay of generated pseudo-data.
First, we decouple the continual learning policy into a diffusion-based
generative behavior model and a multi-head action evaluation model, allowing
the policy to inherit distributional expressivity for encompassing a
progressive range of diverse behaviors. Second, we train a task-conditioned
diffusion model to mimic state distributions of past tasks. Generated states
are paired with corresponding responses from the behavior generator to
represent old tasks with high-fidelity replayed samples. Finally, by
interleaving pseudo samples with real ones of the new task, we continually
update the state and behavior generators to model progressively diverse
behaviors, and regularize the multi-head critic via behavior cloning to
mitigate forgetting. Experiments demonstrate that our method achieves better
forward transfer with less forgetting, and closely approximates the results of
using previous ground-truth data due to its high-fidelity replay of the sample
space. Our code is available at
\href{https://github.com/NJU-RL/CuGRO}{https://github.com/NJU-RL/CuGRO}.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要