We propose to decompose instruction execution to goal prediction and action generation. We design a model that maps raw visual observations to goals using LINGUNET, a …
Collaborative tasks are ubiquitous activities where a form of communication is required in order to reach a joint goal. Collaborative building is one of such tasks. We wish to develop …
H Yu, H Zhang, W Xu - arXiv preprint arXiv:1802.01433, 2018 - arxiv.org
We build a virtual agent for learning language in a 2D maze-like world. The agent sees images of the surrounding environment, listens to a virtual teacher, and takes actions to …
Ö Alaçam, X Li, W Menzel, T Staron - Frontiers in neurorobotics, 2020 - frontiersin.org
Crossmodal interaction in situated language comprehension is important for effective and efficient communication. The relationship between linguistic and visual stimuli provides …
We propose a semantic parsing dataset focused on instruction-driven communication with an agent in the game Minecraft. The dataset consists of 7K human utterances and their …
Spatial language understanding is important for practical applications and as a building block for better abstract language understanding. Much progress has been made through …
We propose a large scale semantic parsing dataset focused on instruction-driven communication with an agent in Minecraft. We describe the data collection process which …
Agents that can execute natural language instructions have many applications. For example, an assistive house robot that can follow instructions will reduce the time spent on doing …