Robots that use language

S Tellex, N Gopalan, H Kress-Gazit… - Annual Review of …, 2020 - annualreviews.org
This article surveys the use of natural language in robotics from a robotics point of view. To
use human language, robots must map words to aspects of the physical world, mediated by …

12-in-1: Multi-task vision and language representation learning

J Lu, V Goswami, M Rohrbach… - Proceedings of the …, 2020 - openaccess.thecvf.com
Much of vision-and-language research focuses on a small but diverse set of independent
tasks and supporting datasets often studied in isolation; however, the visually-grounded …

Improving one-stage visual grounding by recursive sub-query construction

Z Yang, T Chen, L Wang, J Luo - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
We improve one-stage visual grounding by addressing current limitations on grounding long
and complex queries. Existing one-stage methods encode the entire language query as a …

Multi-task collaborative network for joint referring expression comprehension and segmentation

G Luo, Y Zhou, X Sun, L Cao, C Wu… - Proceedings of the …, 2020 - openaccess.thecvf.com
Referring expression comprehension (REC) and segmentation (RES) are two highly-related
tasks, which both aim at identifying the referent according to a natural language expression …

Scanrefer: 3d object localization in rgb-d scans using natural language

DZ Chen, AX Chang, M Nießner - European conference on computer …, 2020 - Springer
We introduce the task of 3D object localization in RGB-D scans using natural language
descriptions. As input, we assume a point cloud of a scanned 3D scene along with a free …

Reverie: Remote embodied visual referring expression in real indoor environments

Y Qi, Q Wu, P Anderson, X Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
One of the long-term challenges of robotics is to enable robots to interact with humans in the
visual world via natural language, as humans are visual animals that communicate through …

Connecting vision and language with localized narratives

J Pont-Tuset, J Uijlings, S Changpinyo… - Computer Vision–ECCV …, 2020 - Springer
Abstract We propose Localized Narratives, a new form of multimodal image annotations
connecting vision and language. We ask annotators to describe an image with their voice …

Linguistic structure guided context modeling for referring image segmentation

T Hui, S Liu, S Huang, G Li, S Yu, F Zhang… - Computer Vision–ECCV …, 2020 - Springer
Referring image segmentation aims to predict the foreground mask of the object referred by
a natural language sentence. Multimodal context of the sentence is crucial to distinguish the …

A real-time cross-modality correlation filtering method for referring expression comprehension

Y Liao, S Liu, G Li, F Wang, Y Chen… - Proceedings of the …, 2020 - openaccess.thecvf.com
Referring expression comprehension aims to localize the object instance described by a
natural language expression. Current referring expression methods have achieved good …

Referring image segmentation via cross-modal progressive comprehension

S Huang, T Hui, S Liu, G Li, Y Wei… - Proceedings of the …, 2020 - openaccess.thecvf.com
Referring image segmentation aims at segmenting the foreground masks of the entities that
can well match the description given in the natural language expression. Previous …