Smallcap: lightweight image captioning prompted with retrieval augmentation

R Ramos, B Martins, D Elliott… - Proceedings of the …, 2023 - openaccess.thecvf.com
… SMALLCAP, an image captioning model, prompted with captions retrieved from an external
datastore of text, based on the input image. This formulation of image captioning enables a …

Retrieval-augmented transformer for image captioning

S Sarto, M Cornia, L Baraldi, R Cucchiara - Proceedings of the 19th …, 2022 - dl.acm.org
… In this paper, we investigate the development of a retrieval component for image captioning.
… -augmented attention layer to predict tokens based on the past context and on text retrieved

Retrieval-augmented image captioning

R Ramos, D Elliott, B Martins - arXiv preprint arXiv:2302.08268, 2023 - arxiv.org
… Inspired by retrieval-augmented language generation and pretrained Vision and Language
(… to image captioning that generates sentences given the input image and a set of captions re…

Memory-augmented image captioning

Z Fei - Proceedings of the AAAI Conference on Artificial …, 2021 - ojs.aaai.org
retrieval data set, which for image captioning can be up to billions of examples. To search
over this large memory bank rapidly, we adopt FAISS (Johnson, Douze, and Jegou 2019), an …

Understanding Retrieval Robustness for Retrieval-Augmented Image Captioning

W Li, J Li, R Ramos, R Tang, D Elliott - arXiv preprint arXiv:2406.02265, 2024 - arxiv.org
… We evaluate the robustness of a single retrievalaugmented image captioning model in
this study. Given variations in training process and model structures, the observed model …

EVCap: Retrieval-Augmented Image Captioning with External Visual-Name Memory for Open-World Comprehension

J Li, DM Vo, A Sugimoto… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
… Large language models (LLMs)-based image captioning has the capability of describing …
retrievalaugmented image captioning method that prompts LLMs with object names retrieved

[HTML][HTML] Text augmentation using BERT for image captioning

V Atliha, D Šešok - Applied Sciences, 2020 - mdpi.com
… visually impaired people improve a human-computer interactions by introducing visual
concepts to a computer and create better features for information retrieval using images [2,3]. …

Re-vilm: Retrieval-augmented visual language model for zero and few-shot image captioning

Z Yang, W Ping, Z Liu, V Korthikanti, W Nie… - arXiv preprint arXiv …, 2023 - arxiv.org
image captioning benchmarks. We aim to demonstrate the superiority of our retrieval
augmentation … and relevance of generated captions through retrieving relevant knowledge from …

Retrieval-enhanced adversarial training with dynamic memory-augmented attention for image paragraph captioning

C Xu, M Yang, X Ao, Y Shen, R Xu, J Tian - Knowledge-Based Systems, 2021 - Elsevier
… Concretely, RAMP treats the retrieved captions as reference captions to augment the … the
image captioning model (generator) to incorporate informative content in retrieved captions into …

Diffusion Based Augmentation for Captioning and Retrieval in Cultural Heritage

D Cioni, L Berlincioni, F Becattini… - Proceedings of the …, 2023 - openaccess.thecvf.com
… In particular, we explore the benefits of augmenting artwork datasets for image captioning.
To this end, we leverage both textual descriptions of the paintings and a diffusion model to …