Parallelized Spatiotemporal Binding
CoRR(2024)
摘要
While modern best practices advocate for scalable architectures that support
long-range interactions, object-centric models are yet to fully embrace these
architectures. In particular, existing object-centric models for handling
sequential inputs, due to their reliance on RNN-based implementation, show poor
stability and capacity and are slow to train on long sequences. We introduce
Parallelizable Spatiotemporal Binder or PSB, the first
temporally-parallelizable slot learning architecture for sequential inputs.
Unlike conventional RNN-based approaches, PSB produces object-centric
representations, known as slots, for all time-steps in parallel. This is
achieved by refining the initial slots across all time-steps through a fixed
number of layers equipped with causal attention. By capitalizing on the
parallelism induced by our architecture, the proposed model exhibits a
significant boost in efficiency. In experiments, we test PSB extensively as an
encoder within an auto-encoding framework paired with a wide variety of decoder
options. Compared to the state-of-the-art, our architecture demonstrates stable
training on longer sequences, achieves parallelization that results in a 60
increase in training speed, and yields performance that is on par with or
better on unsupervised 2D and 3D object-centric scene decomposition and
understanding.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要