We introduce a new benchmark, TAPVid-3D, for evaluating the task of long-range Tracking Any Point in 3D (TAP-3D). While point tracking in two dimensions (TAP) has many …
Tracking dense 3D motion from monocular videos remains challenging, particularly when aiming for pixel-level precision over long sequences. We introduce\Approach, a novel …
A Darkhalil, R Guerrier, AW Harley… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce EgoPoints, a benchmark for point tracking in egocentric videos. We annotate 4.7 K challenging tracks in egocentric sequences. Compared to the popular TAP-Vid-DAVIS …
Visual cues play a significant role for people in foreseeing (plausible) future events, a fundamental skill that aids in social interactions, object manipulation, navigation, and …