Taptr: Tracking any point with transformers as detection

H Li, H Zhang, S Liu, Z Zeng, T Ren, F Li… - European Conference on …, 2025 - Springer
European Conference on Computer Vision, 2025Springer
In this paper, we propose a simple yet effective approach for Tracking Any Point with
TRansformers (TAPTR). Based on the observation that point tracking bears a great
resemblance to object detection and tracking, we borrow designs from DETR-like algorithms
to address the task of TAP. In TAPTR, in each video frame, each tracking point is
represented as a point query, which consists of a positional part and a content part. As in
DETR, each query (its position and content feature) is naturally updated layer by layer. Its …
Abstract
In this paper, we propose a simple yet effective approach for Tracking Any Point with TRansformers (TAPTR). Based on the observation that point tracking bears a great resemblance to object detection and tracking, we borrow designs from DETR-like algorithms to address the task of TAP. In TAPTR, in each video frame, each tracking point is represented as a point query, which consists of a positional part and a content part. As in DETR, each query (its position and content feature) is naturally updated layer by layer. Its visibility is predicted by its updated content feature. Queries belonging to the same tracking point can exchange information through self-attention along the temporal dimension. As all such operations are well-designed in DETR-like algorithms, the model is conceptually very simple. We also adopt some useful designs such as cost volume from optical flow models and develop simple designs to provide long temporal information while mitigating the feature drifting issue. TAPTR demonstrates strong performance with state-of-the-art performance on various datasets with faster inference speed.
Springer
以上显示的是最相近的搜索结果。 查看全部搜索结果