Conceptual captions: A cleaned, hypernymed, image alt-text dataset for automatic image captioning

P Sharma, N Ding, S Goodman… - Proceedings of the 56th …, 2018 - aclanthology.org
We present a new dataset of image caption annotations, Conceptual Captions, which
contains an order of magnitude more images than the MS-COCO dataset (Lin et al., 2014) …

Language models for image captioning: The quirks and what works

J Devlin, H Cheng, H Fang, S Gupta, L Deng… - arXiv preprint arXiv …, 2015 - arxiv.org
Two recent approaches have achieved state-of-the-art results in image captioning. The first
uses a pipelined process where a set of candidate words is generated by a convolutional …

Describing like humans: on diversity in image captioning

Q Wang, AB Chan - … of the IEEE/CVF Conference on …, 2019 - openaccess.thecvf.com
Recently, the state-of-the-art models for image captioning have overtaken human
performance based on the most popular metrics, such as BLEU, METEOR, ROUGE and …

Improving image captioning with better use of captions

Z Shi, X Zhou, X Qiu, X Zhu - arXiv preprint arXiv:2006.11807, 2020 - arxiv.org
Image captioning is a multimodal problem that has drawn extensive attention in both the
natural language processing and computer vision community. In this paper, we present a …

On diversity in image captioning: Metrics and methods

Q Wang, J Wan, AB Chan - IEEE Transactions on Pattern …, 2020 - ieeexplore.ieee.org
Diversity is one of the most important properties in image captioning, as it reflects various
expressions of important concepts presented in an image. However, the most popular …

Image captioning: Transforming objects into words

S Herdade, A Kappeler, K Boakye… - Advances in neural …, 2019 - proceedings.neurips.cc
Image captioning models typically follow an encoder-decoder architecture which uses
abstract image feature vectors as input to the encoder. One of the most successful …

Retrieval-augmented image captioning

R Ramos, D Elliott, B Martins - arXiv preprint arXiv:2302.08268, 2023 - arxiv.org
Inspired by retrieval-augmented language generation and pretrained Vision and Language
(V&L) encoders, we present a new approach to image captioning that generates sentences …

Fine-grained image captioning with clip reward

J Cho, S Yoon, A Kale, F Dernoncourt, T Bui… - arXiv preprint arXiv …, 2022 - arxiv.org
Modern image captioning models are usually trained with text similarity objectives. However,
since reference captions in public datasets often describe the most salient common objects …

[PDF][PDF] Cross-lingual image caption generation

T Miyazaki, N Shimizu - Proceedings of the 54th Annual Meeting …, 2016 - aclanthology.org
Automatically generating a natural language description of an image is a fundamental
problem in artificial intelligence. This task involves both computer vision and natural …

Visuals to text: A comprehensive review on automatic image captioning

Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk
Image captioning refers to automatic generation of descriptive texts according to the visual
content of images. It is a technique integrating multiple disciplines including the computer …