R Corona, S Zhu, D Klein,
T Darrell - arXiv preprint arXiv:2205.09710, 2022 - arxiv.org
Natural language applied to natural 2D images describes a fundamentally 3D world. We
present the Voxel-informed Language Grounder (VLG), a language grounding model that …