Diffusion World Model
arxiv(2024)
摘要
We introduce Diffusion World Model (DWM), a conditional diffusion model
capable of predicting multistep future states and rewards concurrently. As
opposed to traditional one-step dynamics models, DWM offers long-horizon
predictions in a single forward pass, eliminating the need for recursive
quires. We integrate DWM into model-based value estimation, where the
short-term return is simulated by future trajectories sampled from DWM. In the
context of offline reinforcement learning, DWM can be viewed as a conservative
value regularization through generative modeling. Alternatively, it can be seen
as a data source that enables offline Q-learning with synthetic data. Our
experiments on the D4RL dataset confirm the robustness of DWM to long-horizon
simulation. In terms of absolute performance, DWM significantly surpasses
one-step dynamics models with a 44% performance gain, and achieves
state-of-the-art performance.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要