related:XafWF6NfY5AJ:scholar.google.com/

Generation and comprehension of unambiguous object descriptions

J Mao, J Huang, A Toshev… - Proceedings of the …, 2016 - openaccess.thecvf.com

We propose a method that can generate an unambiguous description (known as a referring
expression) of a specific object or region in an image, and which can also comprehend or …

被引用次数：1225 相关文章所有 16 个版本

[PDF] thecvf.com

Intention oriented image captions with guiding objects

Y Zheng, Y Li, S Wang - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

Although existing image caption models can produce promising results using recurrent
neural networks (RNNs), it is difficult to guarantee that an object we care about is contained …

被引用次数：57 相关文章所有 5 个版本

[PDF] thecvf.com

Good news, everyone! context driven entity-aware captioning for news images

AF Biten, L Gomez, M Rusinol… - Proceedings of the …, 2019 - openaccess.thecvf.com

Current image captioning systems perform at a merely descriptive level, essentially
enumerating the objects in the scene and their relations. Humans, on the contrary, interpret …

被引用次数：154 相关文章所有 11 个版本

[PDF] thecvf.com

Describing like humans: on diversity in image captioning

Q Wang, AB Chan - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com

Recently, the state-of-the-art models for image captioning have overtaken human
performance based on the most popular metrics, such as BLEU, METEOR, ROUGE and …

被引用次数：111 相关文章所有 7 个版本

[PDF] thecvf.com

Deep compositional captioning: Describing novel object categories without paired training data

LA Hendricks, S Venugopalan… - Proceedings of the …, 2016 - openaccess.thecvf.com

While recent deep neural network models have achieved promising results on the image
captioning task, they rely largely on the availability of corpora with paired image and …

被引用次数：333 相关文章所有 12 个版本

[PDF] thecvf.com

Pointing novel objects in image captioning

Y Li, T Yao, Y Pan, H Chao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Image captioning has received significant attention with remarkable improvements in recent
advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be …

被引用次数：79 相关文章所有 6 个版本

[PDF] thecvf.com

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

CW Kuo, Z Kira - Proceedings of the IEEE/CVF conference …, 2022 - openaccess.thecvf.com

Significant progress has been made on visual captioning, largely relying on pre-trained
features and later fixed object detectors that serve as rich inputs to auto-regressive models …

被引用次数：50 相关文章所有 5 个版本

[PDF] ieee.org

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

O Vinyals, A Toshev, S Bengio… - IEEE transactions on …, 2016 - ieeexplore.ieee.org

Automatically describing the content of an image is a fundamental problem in artificial
intelligence that connects computer vision and natural language processing. In this paper …

被引用次数：1081 相关文章所有 20 个版本

[PDF] thecvf.com

Unsupervised image captioning

Y Feng, L Ma, W Liu, J Luo - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

Deep neural networks have achieved great successes on the image captioning task.
However, most of the existing models depend heavily on paired image-sentence datasets …

被引用次数：262 相关文章所有 9 个版本

[PDF] cv-foundation.org

Rich image captioning in the wild

K Tran, X He, L Zhang, J Sun, C Carapcea… - Proceedings of the …, 2016 - cv-foundation.org

We present an image caption system that addresses new challenges of automatically
describing images in the wild. The challenges include generating high quality caption with …

被引用次数：163 相关文章所有 10 个版本

高级搜索

QQ 群

Generation and comprehension of unambiguous object descriptions

Intention oriented image captions with guiding objects

Good news, everyone! context driven entity-aware captioning for news images

Describing like humans: on diversity in image captioning

Deep compositional captioning: Describing novel object categories without paired training data

Pointing novel objects in image captioning

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

Show and tell: Lessons learned from the 2015 mscoco image captioning challenge

Unsupervised image captioning

Rich image captioning in the wild

引用