T Gupta, A Vahdat, G Chechik, X Yang, J Kautz… - arXiv preprint arXiv …, 2020 - arxiv.org
Phrase grounding, the problem of associating image regions to caption words, is a crucial
component of vision-language tasks. We show that phrase grounding can be learned by …