DINO-Tracker: Taming DINO for Self-Supervised Point Tracking in a Single Video
CoRR(2024)
摘要
We present DINO-Tracker – a new framework for long-term dense tracking in
video. The pillar of our approach is combining test-time training on a single
video, with the powerful localized semantic features learned by a pre-trained
DINO-ViT model. Specifically, our framework simultaneously adopts DINO's
features to fit to the motion observations of the test video, while training a
tracker that directly leverages the refined features. The entire framework is
trained end-to-end using a combination of self-supervised losses, and
regularization that allows us to retain and benefit from DINO's semantic prior.
Extensive evaluation demonstrates that our method achieves state-of-the-art
results on known benchmarks. DINO-tracker significantly outperforms
self-supervised methods and is competitive with state-of-the-art supervised
trackers, while outperforming them in challenging cases of tracking under
long-term occlusions.
更多查看译文
AI 理解论文
溯源树
样例
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要