Multi-modal image captioning for the visually impaired

H Ahsan, N Bhalla, D Bhatt, K Shah - arXiv preprint arXiv:2105.08106, 2021 - arxiv.org
… an image captioning model for the blind that specifically leverages text detected in the image.
2… -generator mechanism when generating captions to copy the detected text when needed. …

Deep learning approaches on image captioning: A review

T Ghandi, H Pourreza, H Mahyar - ACM Computing Surveys, 2023 - dl.acm.org
image" captioning methods. In this paper, we discuss various methods of image captioning
… most common problems and challenges of image captioning. We provide a comprehensive …

Informative image captioning with external sources of information

S Zhao, P Sharma, T Levinboim, R Soricut - arXiv preprint arXiv …, 2019 - arxiv.org
… We present an image captioning model that combines image features with fine-grained
entities and object labels, and learns to produce fluent and informative image captions. …

From show to tell: A survey on deep learning-based image captioning

M Stefanini, M Cornia, L Baraldi… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
… in image captioning has not reached a conclusive answer yet. This work aims at providing
a comprehensive overview of image captioning approaches, from visual encoding and text

Noise-aware learning from web-crawled image-text data for image captioning

W Kang, J Mun, S Lee, B Roh - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
… learning and DALL·E [40] for the text-to-image generation task. This is mainly thanks to the
… described in alt-texts of web-crawled data. Inspired by this, research on image captioning is …

Re-evaluating automatic metrics for image captioning

M Kilickaya, A Erdem, N Ikizler-Cinbis… - arXiv preprint arXiv …, 2016 - arxiv.org
… In this section, we evaluate the robustness of the automatic image captioning metrics. For
this purpose, we employ the binary (two-alternative) forced choice task introduced in (Hodosh …

Wataa: Web alternative text authoring assistant for improving web content accessibility

H Jeong, M Chun, H Lee, SY Oh, H Jung - Companion proceedings of …, 2023 - dl.acm.org
… the user enters a web page URL, and the alt text checker identifies any images without
alt text. WATAA then uses an image captioning model to generate automatic alt text for each …

The unreasonable effectiveness of CLIP features for image captioning: an experimental analysis

M Barraco, M Cornia, S Cascianelli… - proceedings of the …, 2022 - openaccess.thecvf.com
… To assess the role of visual features extracted from CLIPlike models in image captioning, …
features in standard and more challenging image captioning settings. We use the commonly …

Xgpt: Cross-modal generative pre-training for image captioning

Q Xia, H Huang, N Duan, D Zhang, L Ji, Z Sui… - … Processing and Chinese …, 2021 - Springer
… benchmark datasets, including COCO Captions and Flickr30k Captions. We also use XGPT
to generate image captions as data augmentation for the image retrieval task and achieve …

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org
… auxiliary text, such as generating or editing an imageimage captioning. Note that our method
does not employ the CLIP’s textual encoder, since there is no input text, and the output text