J Lu, V Goswami, M Rohrbach, D Parikh… - arXiv preprint arXiv …, 2019 - arxiv.org
Much of vision-and-language research focuses on a small but diverse set of independent
tasks and supporting datasets often studied in isolation; however, the visually-grounded …