alt text automatic image captioning- 学术资源搜索

Semi-autoregressive transformer for image captioning

Y Zhou, Y Zhang, Z Hu, M Wang - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

… captioning models. We evaluate SATIC model on the challenging MSCOCO [3] image
captioning … We present three examples of generated image captions in Figure 3. From the top …

被引用次数：32 相关文章所有 7 个版本

[PDF] jaewook-lee.com

Image Explorer: Multi-layered touch exploration to make images accessible

J Lee, YH Peng, J Herskovitz, A Guo - Proceedings of the 23rd …, 2021 - dl.acm.org

… image alt-text, they may be of little use as user-written alttext is … auto-generated image
captions, we present Image Explorer, a touch-based image exploration system that divides image …

被引用次数：18 相关文章所有 8 个版本

[PDF] thecvf.com

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

CW Kuo, Z Kira - Proceedings of the IEEE/CVF conference …, 2022 - openaccess.thecvf.com

… richness to allow the captioning model to properly ground … image, and show qualitatively
and quantitatively that this can improve grounding. We validate our method on image captioning…

被引用次数：51 相关文章所有 5 个版本

Turkish Image Captioning with Vision Transformer Based Encoders and Text Decoders

S Yıldız, A Memiş, S Varlı - 2024 32nd Signal Processing and …, 2024 - ieeexplore.ieee.org

… image captions are generated via a text decoder block. To test the performance of the Turkish
image captioning … a benchmark dataset consisting of Turkish image captions, was used. In …

Multi-gru based automated image captioning for smartphones

R Keskin, ÖT Moral, V Kılıç… - 2021 29th Signal …, 2021 - ieeexplore.ieee.org

… Abstract—Image captioning is the description of an image with natural language … of many
image captioning applications. In this study, a novel automatic image captioning system based …

被引用次数：12 相关文章

[PDF] thecvf.com

Fusecap: Leveraging large language models for enriched fused image captions

N Rotstein, D Bensaïd, S Brody… - Proceedings of the …, 2024 - openaccess.thecvf.com

… models for image captioning. However, these models frequently produce generic captions
and may … , we also assess the captions through image-to-text and text-to-image retrieval tasks. …

被引用次数：8 相关文章所有 4 个版本

[PDF] thecvf.com

Convolutional image captioning

J Aneja, A Deshpande… - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

… Secondly, as we will show in our results for image captioning, RNNs tend to produce
lower … Next, we describe an alternative convolutional approach to image captioning which …

被引用次数：446 相关文章所有 12 个版本

[PDF] mdpi.com

A review of transformer-based approaches for image captioning

O Ondeng, H Ouma, P Akuon - Applied Sciences, 2023 - mdpi.com

… In this paper, we review a number of transformer-based image captioning models leading
up to the current state-of-the-art. Other reviews of image captioning models have been …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Iconographic image captioning for artworks

E Cetinic - … ICPR International Workshops and Challenges: Virtual …, 2021 - Springer

… not structured primarily as an image captioning dataset, each … on the down-stream task of
image captioning [42]. Transformer-… of artwork images with the goal to generate image captions …

被引用次数：25 相关文章所有 9 个版本

[PDF] arxiv.org

Entity-Aware Multimodal Alignment Framework for News Image Captioning

J Zhang, H Zhang, X Wan - arXiv preprint arXiv:2402.19404, 2024 - arxiv.org

… Therefore, we attempt to create a specialized image-text matching task within the news
image captioning task to align vision features with entity-aware textual features. Furthermore, …

被引用次数：1 相关文章所有 2 个版本

高级搜索

QQ 群

Semi-autoregressive transformer for image captioning

Image Explorer: Multi-layered touch exploration to make images accessible

Beyond a pre-trained object detector: Cross-modal textual and visual context for image captioning

Turkish Image Captioning with Vision Transformer Based Encoders and Text Decoders

Multi-gru based automated image captioning for smartphones

Fusecap: Leveraging large language models for enriched fused image captions

Convolutional image captioning

A review of transformer-based approaches for image captioning

Iconographic image captioning for artworks

Entity-Aware Multimodal Alignment Framework for News Image Captioning

引用