Modeling context in referring expressions

L Yu, P Poirson, S Yang, AC Berg, TL Berg - Computer Vision–ECCV 2016 …, 2016 - Springer
Humans refer to objects in their environments all the time, especially in dialogue with other
people. We explore generating and comprehending natural language referring expressions …

Generating visual explanations

LA Hendricks, Z Akata, M Rohrbach, J Donahue… - Computer Vision–ECCV …, 2016 - Springer
Clearly explaining a rationale for a classification decision to an end user can be as important
as the decision itself. Existing approaches for deep visual recognition are generally opaque …

Natural language object retrieval

R Hu, H Xu, M Rohrbach, J Feng… - Proceedings of the …, 2016 - openaccess.thecvf.com
In this paper, we address the task of natural language object retrieval, to localize a target
object within a given image based on a natural language query of the object. Natural …

Modeling context between objects for referring expression understanding

VK Nagaraja, VI Morariu, LS Davis - … 11–14, 2016, Proceedings, Part IV …, 2016 - Springer
Referring expressions usually describe an object using properties of the object and
relationships of the object with other objects. We propose a technique that integrates context …

Grounding of textual phrases in images by reconstruction

A Rohrbach, M Rohrbach, R Hu, T Darrell… - Computer Vision–ECCV …, 2016 - Springer
Grounding (ie localizing) arbitrary, free-form textual phrases in visual content is a
challenging problem with many applications for human-computer interaction and image-text …

Segmentation from natural language expressions

R Hu, M Rohrbach, T Darrell - … , The Netherlands, October 11–14, 2016 …, 2016 - Springer
In this paper we approach the novel problem of segmenting an image based on a natural
language expression. This is different from traditional semantic segmentation over a …

Imagenet pre-trained models with batch normalization

M Simon, E Rodner, J Denzler - arXiv preprint arXiv:1612.01452, 2016 - arxiv.org
Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most
state-of-the-art approaches. In this paper, we present a new set of pre-trained models with …

Reasoning about pragmatics with neural listeners and speakers

J Andreas, D Klein - arXiv preprint arXiv:1604.00562, 2016 - arxiv.org
We present a model for pragmatically describing scenes, in which contrastive behavior
results from a combination of inference-driven pragmatics and learned semantics. Like …

Title generation for user generated videos

KH Zeng, TH Chen, JC Niebles, M Sun - … 11-14, 2016, Proceedings, Part II …, 2016 - Springer
A great video title describes the most salient event compactly and captures the viewer's
attention. In contrast, video captioning tends to generate sentences that describe the video …

[PDF][PDF] Easy things first: Installments improve referring expression generation for objects in photographs

S Zarrieß, D Schlangen - Proceedings of the 54th Annual Meeting …, 2016 - aclanthology.org
Research on generating referring expressions has so far mostly focussed on “oneshot
reference”, where the aim is to generate a single, discriminating expression. In interactive …