Learning Adaptive Spatio-Temporal Inference Transformer for Coarse-to-Fine Animal Visual Tracking: Algorithm and Benchmark

International Journal of Computer Vision(2024)

引用 0|浏览4
暂无评分
摘要
Advanced general visual object tracking models have been drastically developed with the access of large annotated datasets and progressive network architectures. However, a general tracker always suffers domain shift when directly adopting to specific testing scenarios. In this paper, we dedicate to addressing the animal tracking problem by proposing a spatio-temporal inference module and a coarse-to-fine tracking strategy. In terms of tracking animals, non-rigid deformation is a typical challenge. Therefore, we particularly design a novel transformer-based inference structure where the changing animal state is transmitted across continuous frames. By explicitly transmitting the appearance variations, this spatio-temporal module enables adaptive target learning, boosting the animal tracking performance compared to the fixed template matching approaches. Besides, considering the altered contours of animals in different frames, we propose to perform coarse-to-fine tracking to obtain a fine-grained animal bounding box with a dedicated distribution-aware regression module. The coarse tracking phase focuses on distinguishing the target against potential distractors in the background. While the fine-grained tracking phase aims at accurately regressing the final animal bounding box. To facilitate animal tracking evaluation, we captured and annotated 145 video sequences with 20 categories from the zoo, forming a new test set for animal tracking, coined ZOO145. We also collected a dataset, AnimalSOT, with 162 video sequences from existing tracking test benchmarks. The experimental performance on animal tracking datasets, MoCA, ZOO145, and AnimalSOT, demonstrate the merit of the proposed approach against advanced general tracking approaches, providing a baseline for future animal tracking studies.
更多
查看译文
关键词
Visual object tracking,Spatio-temporal appearance model,Transformer,Animal tracking dataset
AI 理解论文
溯源树
样例
生成溯源树,研究论文发展脉络
Chat Paper
正在生成论文摘要