Copilot4D: Learning Unsupervised World Models for Autonomous Driving via Discrete Diffusion
arxiv(2023)
摘要
Learning world models can teach an agent how the world works in an
unsupervised manner. Even though it can be viewed as a special case of sequence
modeling, progress for scaling world models on robotic applications such as
autonomous driving has been somewhat less rapid than scaling language models
with Generative Pre-trained Transformers (GPT). We identify two reasons as
major bottlenecks: dealing with complex and unstructured observation space, and
having a scalable generative model. Consequently, we propose Copilot4D, a novel
world modeling approach that first tokenizes sensor observations with VQVAE,
then predicts the future via discrete diffusion. To efficiently decode and
denoise tokens in parallel, we recast Masked Generative Image Transformer as
discrete diffusion and enhance it with a few simple changes, resulting in
notable improvement. When applied to learning world models on point cloud
observations, Copilot4D reduces prior SOTA Chamfer distance by more than 65
for 1s prediction, and more than 50
Odometry, and Argoverse2 datasets. Our results demonstrate that discrete
diffusion on tokenized agent experience can unlock the power of GPT-like
unsupervised learning for robotics.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要