H Fei,
T Yu, P Li - Proceedings of the 2021 Conference of the …, 2021 - aclanthology.org
Recent pretrained vision-language models have achieved impressive performance on cross-
modal retrieval tasks in English. Their success, however, heavily depends on the availability …