TAPTR: Tracking Any Point with Transformers as Detection
CoRR(2024)
Abstract
In this paper, we propose a simple and strong framework for Tracking Any
Point with TRansformers (TAPTR). Based on the observation that point tracking
bears a great resemblance to object detection and tracking, we borrow designs
from DETR-like algorithms to address the task of TAP. In the proposed
framework, in each video frame, each tracking point is represented as a point
query, which consists of a positional part and a content part. As in DETR, each
query (its position and content feature) is naturally updated layer by layer.
Its visibility is predicted by its updated content feature. Queries belonging
to the same tracking point can exchange information through self-attention
along the temporal dimension. As all such operations are well-designed in
DETR-like algorithms, the model is conceptually very simple. We also adopt some
useful designs such as cost volume from optical flow models and develop simple
designs to provide long temporal information while mitigating the feature
drifting issue. Our framework demonstrates strong performance with
state-of-the-art performance on various TAP datasets with faster inference
speed.
MoreTranslated text
AI Read Science
Must-Reading Tree
Example
![](https://originalfileserver.aminer.cn/sys/aminer/pubs/mrt_preview.jpeg)
Generate MRT to find the research sequence of this paper
Chat Paper
Summary is being generated by the instructions you defined