D Qi, L Su, J Song, E Cui, T Bharti, A Sacheti - arXiv preprint arXiv …, 2020 - arxiv.org
… (NLP) and computer vision (CV) communities. For example, Text-Image Retrieval[4] aims to
… 3M images with descriptions harvested from the Alt-text HTML attribute of the web pages, …