相关文章- 学术资源搜索

Denoising large-scale image captioning from alt-text data using content selection models

KR Chandu, P Sharma, S Changpinyo… - arXiv preprint arXiv …, 2020 - arxiv.org

Training large-scale image captioning (IC) models demands access to a rich and diverse set
of training examples, gathered from the wild, often from noisy alt-text data. However, recent …

被引用次数：2 相关文章所有 2 个版本

[PDF] thecvf.com

Guiding image captioning models toward more specific captions

S Kornblith, L Li, Z Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Image captioning is conventionally formulated as the task of generating captions that match
the conditional distribution of reference image-caption pairs. However, reference captions in …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Text-only training for image captioning using noise-injected clip

D Nukrai, R Mokady, A Globerson - arXiv preprint arXiv:2211.00575, 2022 - arxiv.org

We consider the task of image-captioning using only the CLIP model and additional text data
at training time, and no additional captioned images. Our approach relies on the fact that …

被引用次数：77 相关文章所有 5 个版本

[PDF] arxiv.org

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

K Basioti, MA Abdelsalam, F Fancellu… - arXiv preprint arXiv …, 2024 - arxiv.org

Controllable Image Captioning (CIC) aims at generating natural language descriptions for
an image, conditioned on information provided by end users, eg, regions, entities or events …

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：176 相关文章所有 12 个版本

[PDF] thecvf.com

Show, edit and tell: a framework for editing image captions

F Sammani, L Melas-Kyriazi - Proceedings of the IEEE/CVF …, 2020 - openaccess.thecvf.com

Most image captioning frameworks generate captions directly from images, learning a
mapping from visual features to natural language. However, editing existing captions can be …

被引用次数：72 相关文章所有 9 个版本

[PDF] arxiv.org

Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning

A Lindh, RJ Ross, A Mahalunkar, G Salton… - … Conference on Artificial …, 2018 - Springer

Image Captioning is a task that requires models to acquire a multimodal understanding of
the world and to express this understanding in natural language text. While the state-of-the …

被引用次数：22 相关文章所有 9 个版本

[PDF] arxiv.org

I-tuning: Tuning frozen language models with image for lightweight image captioning

Z Luo, Z Hu, Y Xi, R Zhang, J Ma - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Image Captioning is a traditional vision-and-language task that aims to generate the
language description of an image. Recent studies focus on scaling up the model size and …

被引用次数：7 相关文章所有 4 个版本

[PDF] thecvf.com

Noise-aware learning from web-crawled image-text data for image captioning

W Kang, J Mun, S Lee, B Roh - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Image captioning is one of the straightforward tasks that can take advantage of large-scale
web-crawled data which provides rich knowledge about the visual world for a captioning …

被引用次数：12 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

Z Li, D Liu, H Wang, C Zhang, W Cai - arXiv preprint arXiv:2307.14750, 2023 - arxiv.org

Training an image captioner without annotated image-sentence pairs has gained traction in
recent years. Previous approaches can be categorized into two strategies: crawling …

高级搜索

QQ 群

Denoising large-scale image captioning from alt-text data using content selection models

Guiding image captioning models toward more specific captions

Text-only training for image captioning using noise-injected clip

CIC-BART-SSA: Controllable Image Captioning with Structured Semantic Augmentation

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

Show, edit and tell: a framework for editing image captions

Generating Diverse and Meaningful Captions: Unsupervised Specificity Optimization for Image Captioning

I-tuning: Tuning frozen language models with image for lightweight image captioning

Noise-aware learning from web-crawled image-text data for image captioning

Exploring Annotation-free Image Captioning with Retrieval-augmented Pseudo Sentence Generation

相关搜索

引用