Fusecap: Leveraging large language models for enriched fused image captions

N Rotstein, D Bensaïd, S Brody… - Proceedings of the …, 2024 - openaccess.thecvf.com
… models for image captioning. However, these models frequently produce generic captions
and may … , we also assess the captions through image-to-text and text-to-image retrieval tasks. …

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org
… auxiliary text, such as generating or editing an imageimage captioning. Note that our method
does not employ the CLIP’s textual encoder, since there is no input text, and the output text

Text to image synthesis for improved image captioning

MZ Hossain, F Sohel, MF Shiratuddin, H Laga… - IEEE …, 2021 - ieeexplore.ieee.org
images with associative captions is expensive and time-consuming. In this paper, we propose
an image captioning … ) based text to image generator to generate synthetic images. We …

avtmNet: adaptive visual-text merging network for image captioning

H Song, J Zhu, Y Jiang - Computers & Electrical Engineering, 2020 - Elsevier
… Then, we introduce our whole model for image captioning, which includes an image encoder
and an adaptive visual-text decoder in Section 3. The experimental evaluation including …

Phrase-based image captioning

R Lebret, P Pinheiro… - … conference on machine …, 2015 - proceedings.mlr.press
… work they do not leverage a multimodal metric between images and phrases. More recently,
automatic image sentence description approaches based on deep neural networks have …

Positive-augmented contrastive learning for image and video captioning evaluation

S Sarto, M Barraco, M Cornia… - Proceedings of the …, 2023 - openaccess.thecvf.com
… Finally, we test the system-level correlation of the proposed metric when considering popular
image captioning approaches, and assess the impact of employing different cross-modal …

Chinese alt text writing based on deep learning

J Xie, R Li, S Lv, Y Wang, Q Wang, YI Vorotnitsky - 2019 - elib.bsu.by
… This paper attempts to generate accurate and coherent Chinese alt texts for images. Drawing
on the classic image captioning model NIC, the author designed a novel Chinese alt text

Understanding guided image captioning performance across domains

EG Ng, B Pang, P Sharma, R Soricut - arXiv preprint arXiv:2012.02339, 2020 - arxiv.org
… guiding text, a free-form text input that is assumed to be related to some concept(s) in the
image; and we consider the Guided Image Captioning task, where the guiding text is provided …

Comprehending and ordering semantics for image captioning

Y Li, Y Pan, T Yao, T Mei - … of the IEEE/CVF conference on …, 2022 - openaccess.thecvf.com
… for image captioning. ent sentence, which covers most semantics in an image that are
worthy of mention and meanwhile describes them in linguistic order. Modern image captioning

Measuring representational harms in image captioning

A Wang, S Barocas, K Laird, H Wallach - Proceedings of the 2022 ACM …, 2022 - dl.acm.org
captioning system. Our goal was not to audit this image captioning system, but rather to …
Specifically, each image’s alttext was extracted and fed through a data cleaning pipeline that…