MuDreamer: Learning Predictive World Models without Reconstruction
CoRR(2024)
摘要
The DreamerV3 agent recently demonstrated state-of-the-art performance in
diverse domains, learning powerful world models in latent space using a pixel
reconstruction loss. However, while the reconstruction loss is essential to
Dreamer's performance, it also necessitates modeling unnecessary information.
Consequently, Dreamer sometimes fails to perceive crucial elements which are
necessary for task-solving when visual distractions are present in the
observation, significantly limiting its potential. In this paper, we present
MuDreamer, a robust reinforcement learning agent that builds upon the DreamerV3
algorithm by learning a predictive world model without the need for
reconstructing input signals. Rather than relying on pixel reconstruction,
hidden representations are instead learned by predicting the environment value
function and previously selected actions. Similar to predictive self-supervised
methods for images, we find that the use of batch normalization is crucial to
prevent learning collapse. We also study the effect of KL balancing between
model posterior and prior losses on convergence speed and learning stability.
We evaluate MuDreamer on the commonly used DeepMind Visual Control Suite and
demonstrate stronger robustness to visual distractions compared to DreamerV3
and other reconstruction-free approaches, replacing the environment background
with task-irrelevant real-world videos. Our method also achieves comparable
performance on the Atari100k benchmark while benefiting from faster training.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要