相关文章- 学术资源搜索

Scaling up vision-language pre-training for image captioning

X Hu, Z Gan, J Wang, Z Yang, Z Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com

In recent years, we have witnessed significant performance boost in the image captioning
task based on vision-language pre-training (VLP). Scale is believed to be an important factor …

被引用次数：242 相关文章所有 5 个版本

[PDF] thecvf.com

Smallcap: lightweight image captioning prompted with retrieval augmentation

R Ramos, B Martins, D Elliott… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent advances in image captioning have focused on scaling the data and model size,
substantially increasing the cost of pre-training and finetuning. As an alternative to large …

被引用次数：48 相关文章所有 5 个版本

[PDF] thecvf.com

Nocaps: Novel object captioning at scale

H Agrawal, K Desai, Y Wang, X Chen… - Proceedings of the …, 2019 - openaccess.thecvf.com

Image captioning models have achieved impressive results on datasets containing limited
visual concepts and large amounts of paired image-caption training data. However, if these …

被引用次数：265 相关文章所有 11 个版本

[PDF] thecvf.com

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

J Chen, H Guo, K Yi, B Li… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

The limited availability of annotated data often hinders real-world applications of machine
learning. To efficiently learn from small quantities of multimodal data, we leverage the …

被引用次数：164 相关文章所有 12 个版本

[PDF] thecvf.com

Fusecap: Leveraging large language models for enriched fused image captions

N Rotstein, D Bensaïd, S Brody… - Proceedings of the …, 2024 - openaccess.thecvf.com

The advent of vision-language pre-training techniques enhanced substantial progress in the
development of models for image captioning. However, these models frequently produce …

被引用次数：8 相关文章所有 4 个版本

[PDF] thecvf.com

Pointing novel objects in image captioning

Y Li, T Yao, Y Pan, H Chao… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Image captioning has received significant attention with remarkable improvements in recent
advances. Nevertheless, images in the wild encapsulate rich knowledge and cannot be …

被引用次数：81 相关文章所有 6 个版本

[PDF] thecvf.com

Neural baby talk

J Lu, J Yang, D Batra, D Parikh - Proceedings of the IEEE …, 2018 - openaccess.thecvf.com

We introduce a novel framework for image captioning that can produce natural language
explicitly grounded in entities that object detectors find in the image. Our approach …

被引用次数：550 相关文章所有 9 个版本

[PDF] thecvf.com

Learning to collocate neural modules for image captioning

X Yang, H Zhang, J Cai - Proceedings of the IEEE/CVF …, 2019 - openaccess.thecvf.com

We do not speak word by word from scratch; our brain quickly structures a pattern like sth do
sth at someplace and then fill in the detailed description. To render existing encoder …

被引用次数：102 相关文章所有 8 个版本

[PDF] thecvf.com

Injecting semantic concepts into end-to-end image captioning

Z Fang, J Wang, X Hu, L Liang, Z Gan… - Proceedings of the …, 2022 - openaccess.thecvf.com

Tremendous progress has been made in recent years in developing better image captioning
models, yet most of them rely on a separate object detector to extract regional features …

被引用次数：94 相关文章所有 9 个版本

[PDF] arxiv.org

Clipcap: Clip prefix for image captioning

R Mokady, A Hertz, AH Bermano - arXiv preprint arXiv:2111.09734, 2021 - arxiv.org

Image captioning is a fundamental task in vision-language understanding, where the model
predicts a textual informative caption to a given input image. In this paper, we present a …

被引用次数：583 相关文章所有 2 个版本

高级搜索

QQ 群

Scaling up vision-language pre-training for image captioning

Smallcap: lightweight image captioning prompted with retrieval augmentation

Nocaps: Novel object captioning at scale

Visualgpt: Data-efficient adaptation of pretrained language models for image captioning

Fusecap: Leveraging large language models for enriched fused image captions

Pointing novel objects in image captioning

Neural baby talk

Learning to collocate neural modules for image captioning

Injecting semantic concepts into end-to-end image captioning

Clipcap: Clip prefix for image captioning

相关搜索

引用