Spatial Cognition from Egocentric Video: Out of Sight, Not Out of Mind
arxiv(2024)
摘要
As humans move around, performing their daily tasks, they are able to recall
where they have positioned objects in their environment, even if these objects
are currently out of sight. In this paper, we aim to mimic this spatial
cognition ability. We thus formulate the task of Out of Sight, Not Out of Mind
- 3D tracking active objects using observations captured through an egocentric
camera. We introduce Lift, Match and Keep (LMK), a method which lifts partial
2D observations to 3D world coordinates, matches them over time using visual
appearance, 3D location and interactions to form object tracks, and keeps these
object tracks even when they go out-of-view of the camera - hence keeping in
mind what is out of sight. We test LMK on 100 long videos from EPIC-KITCHENS.
Our results demonstrate that spatial cognition is critical for correctly
locating objects over short and long time scales. E.g., for one long egocentric
video, we estimate the 3D location of 50 active objects. Of these, 60
correctly positioned in 3D after 2 minutes of leaving the camera view.
更多查看译文
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要