Denoising large-scale image captioning from alt-text data using content selection models

KR Chandu, P Sharma, S Changpinyo… - arXiv preprint arXiv …, 2020 - arxiv.org
Training large-scale image captioning (IC) models demands access to a rich and diverse set
of training examples, gathered from the wild, often from noisy alt-text data. However, recent …

Guiding image captioning models toward more specific captions

S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning is conventionally formulated as the task of generating captions that match
the conditional distribution of reference image-caption pairs. However, reference captions in …

Text-only training for image captioning using noise-injected clip

D Nukrai, R Mokady, A Globerson - arXiv preprint arXiv:2211.00575, 2022 - arxiv.org
We consider the task of image-captioning using only the CLIP model and additional text data
at training time, and no additional captioned images. Our approach relies on the fact that …

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

K Basioti, MA Abdelsalam, F Fancellu… - arXiv preprint arXiv …, 2024 - arxiv.org
Controllable Image Captioning (CIC) aims at generating natural language descriptions for
an image, conditioned on information provided by end users, eg, regions, entities or events …

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

Show, edit and tell: a framework for editing image captions

F Sammani, L Melas-Kyriazi - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com
Most image captioning frameworks generate captions directly from images, learning a
mapping from visual features to natural language. However, editing existing captions can be …

Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning

A Lindh, RJ Ross, A Mahalunkar, G Salton… - … Conference on Artificial …, 2018 - Springer
Image Captioning is a task that requires models to acquire a multimodal understanding of
the world and to express this understanding in natural language text. While the state-of-the …

I-tuning: Tuning frozen language models with image for lightweight image captioning

Z Luo, Z Hu, Y Xi, R Zhang, J Ma - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Image Captioning is a traditional vision-and-language task that aims to generate the
language description of an image. Recent studies focus on scaling up the model size and …

Noise-aware learning from web-crawled image-text data for image captioning

W Kang, J Mun, S Lee, B Roh - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Image captioning is one of the straightforward tasks that can take advantage of large-scale
web-crawled data which provides rich knowledge about the visual world for a captioning …

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

Z Li, D Liu, H Wang, C Zhang, W Cai - arXiv preprint arXiv:2307.14750, 2023 - arxiv.org
Training an image captioner without annotated image-sentence pairs has gained traction in
recent years. Previous approaches can be categorized into two strategies: crawling …