language models image captioning- 学术资源搜索

Language models for image captioning: The quirks and what works

J Devlin, H Cheng, H Fang, S Gupta, L Deng… - arXiv preprint arXiv …, 2015 - arxiv.org

… , and then a maximum entropy (ME) language model is used to arrange these words into a
… In this paper, we compare the merits of these different language modeling approaches for …

被引用次数：337 相关文章所有 15 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

… for image captioning by utilizing pretrained language models (… image captioning. To our
knowledge, this is the first work that … large pretrained language models for image captioning. …

被引用次数：164 相关文章所有 12 个版本

[PDF] thecvf.com

An empirical study of language cnn for image captioning

J Gu, G Wang, J Cai, T Chen - Proceedings of the IEEE …, 2017 - openaccess.thecvf.com

… statistical language modeling tasks and shows competitive performance in image captioning…
In this work, we present an image captioning model with language CNN to explore both …

被引用次数：163 相关文章所有 7 个版本

[HTML] sciencedirect.com

[HTML][HTML] Image captioning for effective use of language models in knowledge-based visual question answering

A Salaberria, G Azkune, OL de Lacalle, A Soroa… - Expert Systems with …, 2023 - Elsevier

… image captioning as a way to verbalize the information in the image, where the captions
are … Once the captions are generated, all the inference in our method is done using text-only …

被引用次数：41 相关文章所有 4 个版本

[PDF] thecvf.com

Scaling up vision-language pre-training for image captioning

X Hu, Z Gan, J Wang, Z Yang, Z Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

… image captioning, provide a more comprehensive study on the scaling behavior via altering
data and model … as predicting the next token with language modeling, as shown in Figure 4 …

被引用次数：242 相关文章所有 5 个版本

[PDF] arxiv.org

Image captioning with deep bidirectional LSTMs

C Wang, H Yang, C Bartz, C Meinel - Proceedings of the 24th ACM …, 2016 - dl.acm.org

… for image captioning with multimodal neural language model… with structure-content neural
language model (SC-NLM). … which replaces feed-forward neural language model in [13]. …

被引用次数：338 相关文章所有 4 个版本

[PDF] aaai.org

Unified vision-language pre-training for image captioning and vqa

L Zhou, H Palangi, L Zhang, H Hu, J Corso… - Proceedings of the AAAI …, 2020 - ojs.aaai.org

… We observe that compared to the two cases where we do not use any pre-trained model
or use only the pre-trained language model (ie, BERT), using VLP significantly speedups the …

被引用次数：904 相关文章所有 7 个版本

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org

… a language model to generate the image captions. The recently proposed CLIP model contains
rich … it best for vision-language perception. Our key idea is that together with a pre-trained …

被引用次数：583 相关文章所有 2 个版本

[PDF] arxiv.org

Fusion models for improved image captioning

M Kalimuthu, A Mogadala, M Mosbach… - Pattern Recognition. ICPR …, 2021 - Springer

… Building on these developments, we propose to incorporate external language models
into visual captioning frameworks to aid and improve their capabilities both for description …

被引用次数：17 相关文章所有 5 个版本

[PDF] thecvf.com

Fusecap: Leveraging large language models for enriched fused image captions

N Rotstein, D Bensaïd, S Brody… - Proceedings of the …, 2024 - openaccess.thecvf.com

… To address this challenge, we leverage existing captions and explore augmenting them with
… the original captions using a large language model (LLM), yielding comprehensive image …

被引用次数：8 相关文章所有 4 个版本

高级搜索

QQ 群

Language models for image captioning: The quirks and what works

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

An empirical study of language cnn for image captioning

[HTML][HTML] Image captioning for effective use of language models in knowledge-based visual question answering

Scaling up vision-language pre-training for image captioning

Image captioning with deep bidirectional LSTMs

Unified vision-language pre-training for image captioning and vqa

Clipcap: Clip prefix for image captioning

Fusion models for improved image captioning

Fusecap: Leveraging large language models for enriched fused image captions

相关搜索

引用