H Luo,
L Ji, M Zhong, Y Chen, W Lei,
N Duan… - arXiv preprint arXiv …, 2021 - arxiv.org
Video-text retrieval plays an essential role in multi-modal research and has been widely
used in many real-world web applications. The CLIP (Contrastive Language-Image Pre …