Touchdown: Natural language navigation and spatial reasoning in visual street environments

H Chen, A Suhr, D Misra… - Proceedings of the …, 2019 - openaccess.thecvf.com
We study the problem of jointly reasoning about language and vision through a navigation
and spatial reasoning task. We introduce the Touchdown task and dataset, where an agent …

Mapping instructions to actions in 3d environments with visual goal prediction

D Misra, A Bennett, V Blukis, E Niklasson… - arXiv preprint arXiv …, 2018 - arxiv.org
We propose to decompose instruction execution to goal prediction and action generation.
We design a model that maps raw visual observations to goals using LINGUNET, a …

Learning to execute actions or ask clarification questions

Z Shi, Y Feng, A Lipani - arXiv preprint arXiv:2204.08373, 2022 - arxiv.org
Collaborative tasks are ubiquitous activities where a form of communication is required in
order to reach a joint goal. Collaborative building is one of such tasks. We wish to develop …

Interactive grounded language acquisition and generalization in a 2d world

H Yu, H Zhang, W Xu - arXiv preprint arXiv:1802.01433, 2018 - arxiv.org
We build a virtual agent for learning language in a 2D maze-like world. The agent sees
images of the surrounding environment, listens to a virtual teacher, and takes actions to …

Crossmodal Language Comprehension—Psycholinguistic Insights and Computational Approaches

Ö Alaçam, X Li, W Menzel, T Staron - Frontiers in neurorobotics, 2020 - frontiersin.org
Crossmodal interaction in situated language comprehension is important for effective and
efficient communication. The relationship between linguistic and visual stimuli provides …

Why Build an Assistant in Minecraft?

A Szlam, J Gray, K Srinet, Y Jernite, A Joulin… - arXiv preprint arXiv …, 2019 - arxiv.org
arXiv:1907.09273v2 [cs.AI] 25 Jul 2019 Page 1 Why Build an Assistant in Minecraft? Arthur
Szlam, Jonathan Gray, Kavya Srinet, Yacine Jernite, Armand Joulin, Gabriel Synnaeve, Douwe …

CraftAssist instruction parsing: Semantic parsing for a voxel-world assistant

K Srinet, Y Jernite, J Gray, A Szlam - … of the 58th Annual Meeting of …, 2020 - aclanthology.org
We propose a semantic parsing dataset focused on instruction-driven communication with
an agent in the game Minecraft. The dataset consists of 7K human utterances and their …

Points, paths, and playscapes: Large-scale spatial language understanding tasks set in the real world

J Baldridge, T Bedrax-Weiss, D Luong… - Proceedings of the …, 2018 - aclanthology.org
Spatial language understanding is important for practical applications and as a building
block for better abstract language understanding. Much progress has been made through …

Craftassist instruction parsing: Semantic parsing for a minecraft assistant

Y Jernite, K Srinet, J Gray, A Szlam - arXiv preprint arXiv:1905.01978, 2019 - arxiv.org
We propose a large scale semantic parsing dataset focused on instruction-driven
communication with an agent in Minecraft. We describe the data collection process which …

[图书][B] Scalable and Interpretable Approaches for Learning to Follow Natural Language Instructions

DK Misra - 2019 - search.proquest.com
Agents that can execute natural language instructions have many applications. For example,
an assistive house robot that can follow instructions will reduce the time spent on doing …