S Mo,
H Wang, H Li, X Tang - arXiv preprint arXiv:2405.07202, 2024 - arxiv.org
Video-language pre-training is a typical and challenging problem that aims at learning
visual and textual representations from large-scale data in a self-supervised way. Existing …