Revisiting the" video" in video-language understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - Proceedings of the …, 2022 - openaccess.thecvf.com
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Revisiting the “Video” in Video-Language Understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - 2022 IEEE/CVF …, 2022 - ieeexplore.ieee.org
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Revisiting the “Video” in Video-Language Understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - 2022 IEEE/CVF …, 2022 - computer.org
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Revisiting the" Video" in Video-Language Understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu, L Fei-Fei… - arXiv preprint arXiv …, 2022 - arxiv.org
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Revisiting the" Video" in Video-Language Understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu… - arXiv e …, 2022 - ui.adsabs.harvard.edu
What makes a video task uniquely suited for videos, beyond what can be understood from a
single image? Building on recent progress in self-supervised image-language models, we …

Revisiting the “Video” in Video-Language Understanding

S Buch, C Eyzaguirre, A Gaidon, J Wu, L Fei-Fei… - stanfordvl.github.io
Our Atemporal Probe (ATP) model builds on progress in self-supervised image-language
understanding [1], and learns to discretely select a frozen image-level encoding (without …