C Xue, W Zhang, Y Hao, S Lu, P Torr, S Bai - arXiv e-prints, 2022 - ui.adsabs.harvard.edu
Abstract Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited
various vision-language tasks by jointly learning visual and textual representations, which …