Generation and comprehension of unambiguous object descriptions

L Yu, P Poirson, S Yang, AC Berg, TL Berg - Computer Vision–ECCV 2016 …, 2016 - Springer

Humans refer to objects in their environments all the time, especially in dialogue with other
people. We explore generating and comprehending natural language referring expressions …

被引用次数：1067 相关文章所有 7 个版本

[PDF] arxiv.org

Generating visual explanations

LA Hendricks, Z Akata, M Rohrbach, J Donahue… - Computer Vision–ECCV …, 2016 - Springer

Clearly explaining a rationale for a classification decision to an end user can be as important
as the decision itself. Existing approaches for deep visual recognition are generally opaque …

被引用次数：725 相关文章所有 11 个版本

[PDF] thecvf.com

Natural language object retrieval

R Hu, H Xu, M Rohrbach, J Feng… - Proceedings of the …, 2016 - openaccess.thecvf.com

In this paper, we address the task of natural language object retrieval, to localize a target
object within a given image based on a natural language query of the object. Natural …

被引用次数：627 相关文章所有 12 个版本

[PDF] arxiv.org

Modeling context between objects for referring expression understanding

VK Nagaraja, VI Morariu, LS Davis - … 11–14, 2016, Proceedings, Part IV …, 2016 - Springer

Referring expressions usually describe an object using properties of the object and
relationships of the object with other objects. We propose a technique that integrates context …

被引用次数：422 相关文章所有 4 个版本

[PDF] arxiv.org

Grounding of textual phrases in images by reconstruction

A Rohrbach, M Rohrbach, R Hu, T Darrell… - Computer Vision–ECCV …, 2016 - Springer

Grounding (ie localizing) arbitrary, free-form textual phrases in visual content is a
challenging problem with many applications for human-computer interaction and image-text …

被引用次数：536 相关文章所有 7 个版本

[PDF] arxiv.org

Segmentation from natural language expressions

R Hu, M Rohrbach, T Darrell - … , The Netherlands, October 11–14, 2016 …, 2016 - Springer

In this paper we approach the novel problem of segmenting an image based on a natural
language expression. This is different from traditional semantic segmentation over a …

被引用次数：413 相关文章所有 8 个版本

[PDF] arxiv.org

Imagenet pre-trained models with batch normalization

M Simon, E Rodner, J Denzler - arXiv preprint arXiv:1612.01452, 2016 - arxiv.org

Convolutional neural networks (CNN) pre-trained on ImageNet are the backbone of most
state-of-the-art approaches. In this paper, we present a new set of pre-trained models with …

被引用次数：195 相关文章所有 5 个版本

[PDF] arxiv.org

Reasoning about pragmatics with neural listeners and speakers

J Andreas, D Klein - arXiv preprint arXiv:1604.00562, 2016 - arxiv.org

We present a model for pragmatically describing scenes, in which contrastive behavior
results from a combination of inference-driven pragmatics and learned semantics. Like …

被引用次数：181 相关文章所有 11 个版本

[PDF] arxiv.org

Title generation for user generated videos

KH Zeng, TH Chen, JC Niebles, M Sun - … 11-14, 2016, Proceedings, Part II …, 2016 - Springer

A great video title describes the most salient event compactly and captures the viewer's
attention. In contrast, video captioning tends to generate sentences that describe the video …

被引用次数：102 相关文章所有 5 个版本

[PDF] aclanthology.org

[PDF][PDF] Easy things first: Installments improve referring expression generation for objects in photographs

S Zarrieß, D Schlangen - Proceedings of the 54th Annual Meeting …, 2016 - aclanthology.org

Research on generating referring expressions has so far mostly focussed on “oneshot
reference”, where the aim is to generate a single, discriminating expression. In interactive …

被引用次数：25 相关文章所有 6 个版本

高级搜索

QQ 群