H Zhang, P Zeng,
L Gao, J Song, Y Duan,
X Lyu… - arXiv preprint arXiv …, 2024 - arxiv.org
Adapting large-scale image-text pre-training models, eg, CLIP, to the video domain
represents the current state-of-the-art for text-video retrieval. The primary approaches …