Videocon: Robust video-language alignment via contrast captions

H Bansal, Y Bitton, I Szpektor… - Proceedings of the …, 2024 - openaccess.thecvf.com
Despite being (pre) trained on a massive amount of data state-of-the-art video-language
alignment models are not robust to semantically-plausible contrastive changes in the video …

VideoCon: Robust Video-Language Alignment via Contrast Captions

H Bansal, Y Bitton, I Szpektor, KW Chang… - ICLR 2024 Workshop on … - openreview.net
Despite being (pre) trained on a massive amount of data, state-of-the-art video-language
alignment models are not robust to semantically-plausible contrastive changes in the video …

VideoCon: Robust Video-Language Alignment via Contrast Captions

H Bansal, Y Bitton, I Szpektor, KW Chang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Despite being (pre) trained on a massive amount of data, state-of-the-art video-language
alignment models are not robust to semantically-plausible contrastive changes in the video …

VideoCon: Robust Video-Language Alignment via Contrast Captions

H Bansal, Y Bitton, I Szpektor, KW Chang… - arXiv preprint arXiv …, 2023 - arxiv.org
Despite being (pre) trained on a massive amount of data, state-of-the-art video-language
alignment models are not robust to semantically-plausible contrastive changes in the video …