Flickr30k entities: Collecting region-to-phrase correspondences for richer image-to-sentence models

BA Plummer, L Wang, CM Cervantes… - Proceedings of the …, 2015 - openaccess.thecvf.com
The Flickr30k dataset has become a standard benchmark for sentence-based image
description. This paper presents Flickr30k Entities, which augments the 158k captions from …

A hierarchical and regional deep learning architecture for image description generation

P Kinghorn, L Zhang, L Shao - Pattern Recognition Letters, 2019 - Elsevier
This research proposes a distinctive deep learning network architecture for image
captioning and description generation. Specifically, we propose a hierarchically trained …

Intention oriented image captions with guiding objects

Y Zheng, Y Li, S Wang - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Although existing image caption models can produce promising results using recurrent
neural networks (RNNs), it is difficult to guarantee that an object we care about is contained …

CaptionNet: A tailor-made recurrent neural network for generating image descriptions

L Yang, H Wang, P Tang, Q Li - IEEE Transactions on …, 2020 - ieeexplore.ieee.org
Image captioning is a challenging task of visual understanding and has drawn more
attention of researchers. In general, two inputs are required at each time step by the Long …

Describing like humans: on diversity in image captioning

Q Wang, AB Chan - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Recently, the state-of-the-art models for image captioning have overtaken human
performance based on the most popular metrics, such as BLEU, METEOR, ROUGE and …

Image captioning with semantic attention

Q You, H Jin, Z Wang, C Fang… - Proceedings of the IEEE …, 2016 - openaccess.thecvf.com
Automatically generating a natural language description of an image has attracted interests
recently both because of its importance in practical applications and because it connects two …

Deep visual-semantic alignments for generating image descriptions

A Karpathy, L Fei-Fei - Proceedings of the IEEE conference on …, 2015 - cv-foundation.org
We present a model that generates natural language descriptions of images and their
regions. Our approach leverages datasets of images and their sentence descriptions to …

Rethinking the reference-based distinctive image captioning

Y Mao, L Chen, Z Jiang, D Zhang, Z Zhang… - Proceedings of the 30th …, 2022 - dl.acm.org
Distinctive Image Captioning (DIC)---generating distinctive captions that describe the unique
details of a target image---has received considerable attention over the last few years. A …

A region-based image caption generator with refined descriptions

P Kinghorn, L Zhang, L Shao - Neurocomputing, 2018 - Elsevier
Describing the content of an image is a challenging task. To enable detailed description, it
requires the detection and recognition of objects, people, relationships and associated …

Discriminability objective for training descriptive captions

R Luo, B Price, S Cohen… - Proceedings of the …, 2018 - openaccess.thecvf.com
One property that remains lacking in image captions generated by contemporary methods is
discriminability: being able to tell two images apart given the caption for one of them. We …