S Song, J Wan, Z Yang, J Tang, W Cheng, X Bai… - arXiv preprint arXiv …, 2022 - arxiv.org
Recently, vision-language joint representation learning has proven to be highly effective in
various scenarios. In this paper, we specifically adapt vision-language joint learning for …