Denoising large-scale image captioning from alt-text data using content selection models

KR Chandu, P Sharma, S Changpinyo… - arXiv preprint arXiv …, 2020 - arxiv.org
… pieces of information consistent with the image as a skeleton. Sub… automatically extracted
from the alttext captions. We focus on language-based skeletons that are derived from captions

Evaluating the effectiveness of automatic image captioning for web accessibility

M Leotta, F Mori, M Ribaudo - Universal access in the information society, 2023 - Springer
… we associated a set of five textual descriptions: the first is the alternative text written by
the human contributors to the online encyclopedia Footnote 25 , while the other four are the …

Visuals to text: A comprehensive review on automatic image captioning

Y Ming, N Hu, C Fan, F Feng… - IEEE/CAA Journal of …, 2022 - researchportal.port.ac.uk
… to generate image caption with impressive progress. To summarize the recent advances
in image captioning, we present a comprehensive review on image captioning, covering both …

icap: Interactive image captioning with predictive text

Z Jia, X Li - Proceedings of the 2020 international conference on …, 2020 - dl.acm.org
image captioning with human in the loop. Different from automated image captioning
where a given test image … , we have access to both the test image and a sequence of (incomplete) …

Scaling up vision-language pre-training for image captioning

X Hu, Z Gan, J Wang, Z Yang, Z Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
… As some alttexts are too long, we split them up by punctuation marks, such as period and
exclamation mark, and select the longest part. To filter out some rare or misspelled words, we …

Clipscore: A reference-free evaluation metric for image captioning

J Hessel, A Holtzman, M Forbes, RL Bras… - arXiv preprint arXiv …, 2021 - arxiv.org
… We measure CLIP-S’s capacity to reconstruct a set of 2.8K human judgments of alttext
Each alt-text was rated on a scale of 0 to 3 in terms of its probable utility as an alt-text. While the …

What's in an ALT Tag? Exploring Caption Content Priorities through Collaborative Captioning

A Muehlbradt, SK Kane - ACM Transactions on Accessible Computing …, 2022 - dl.acm.org
… In this study, we explore contextual differences in image captioning based on the domain, …
general feedback about the process of image captioning, image captions, and the study, we …

ImageExplorer: Multi-layered touch exploration to encourage skepticism towards imperfect AI-generated image captions

J Lee, J Herskovitz, YH Peng, A Guo - … of the 2022 CHI Conference on …, 2022 - dl.acm.org
… ’s Automatic AltText system originally aimed to generate image tags that describe the prominent
objects in an image [… -language image description, along with providing tags grouped by …

Transform and tell: Entity-aware news image captioning

A Tran, A Mathews, L Xie - … of the IEEE/CVF conference on …, 2020 - openaccess.thecvf.com
… set a new SOTA for news image captioning. Our model can incorporate real-world knowledge
about entities across different modalities and generate text with better linguistic diversity. …

Guiding image captioning models toward more specific captions

S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
… These problems are further exacerbated when models are trained directly on image-alt
text pairs collected from the internet. In this work, we show that it is possible to generate more …