M Zhou, L Zhou, S Wang, Y Cheng, L Li, Z Yu… - arXiv preprint arXiv …, 2021 - arxiv.org
Vision-and-language pre-training has achieved impressive success in learning multimodal
representations between vision and language. To generalize this success to non-English …