相关文章- 学术资源搜索

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

CE Wu, Y Tian, H Yu, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Vision-language models such as CLIP learn a generic text-image embedding from large-
scale training data. A vision-language model can be adapted to a new classification task …

被引用次数：10 相关文章所有 5 个版本

[PDF] arxiv.org

Clip-adapter: Better vision-language models with feature adapters

P Gao, S Geng, R Zhang, T Ma, R Fang… - International Journal of …, 2024 - Springer

Large-scale contrastive vision-language pretraining has shown significant progress in visual
representation learning. Unlike traditional visual systems trained by a fixed set of discrete …

被引用次数：758 相关文章所有 10 个版本

[PDF] arxiv.org

Svl-adapter: Self-supervised adapter for vision-language pretrained models

O Pantazis, G Brostow, K Jones… - arXiv preprint arXiv …, 2022 - arxiv.org

Vision-language models such as CLIP are pretrained on large volumes of internet sourced
image and text pairs, and have been shown to sometimes exhibit impressive zero-and low …

被引用次数：36 相关文章所有 9 个版本

[PDF] thecvf.com

Bayesian prompt learning for image-language model generalization

MM Derakhshani, E Sanchez, A Bulat… - Proceedings of the …, 2023 - openaccess.thecvf.com

Foundational image-language models have generated considerable interest due to their
efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of …

被引用次数：18 相关文章所有 6 个版本

[PDF] arxiv.org

Unsupervised prompt learning for vision-language models

T Huang, J Chu, F Wei - arXiv preprint arXiv:2204.03649, 2022 - arxiv.org

Contrastive vision-language models like CLIP have shown great progress in transfer
learning. In the inference stage, the proper text description, also known as prompt, needs to …

被引用次数：137 相关文章所有 2 个版本

[PDF] arxiv.org

Learning to decompose visual features with latent textual prompts

F Wang, M Li, X Lin, H Lv, AG Schwing, H Ji - arXiv preprint arXiv …, 2022 - arxiv.org

Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …

被引用次数：23 相关文章所有 4 个版本

[PDF] arxiv.org

Debiasing vision-language models via biased prompts

CY Chuang, V Jampani, Y Li, A Torralba… - arXiv preprint arXiv …, 2023 - arxiv.org

Machine learning models have been shown to inherit biases from their training datasets.
This can be particularly problematic for vision-language foundation models trained on …

被引用次数：56 相关文章所有 2 个版本

[PDF] thecvf.com

What does clip know about a red circle? visual prompt engineering for vlms

A Shtedritski, C Rupprecht… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Large-scale Vision-Language Models, such as CLIP, learn powerful image-text
representations that have found numerous applications, from zero-shot classification to text …

被引用次数：90 相关文章所有 7 个版本

[PDF] thecvf.com

Logoprompt: Synthetic text images can be good visual prompts for vision-language models

C Shi, S Yang - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com

Prompt engineering is a powerful tool used to enhance the performance of pre-trained
models on downstream tasks. For example, providing the prompt" Let's think step by step" …

被引用次数：18 相关文章所有 6 个版本

[PDF] thecvf.com

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …

被引用次数：1245 相关文章所有 7 个版本

高级搜索

QQ 群

Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

Clip-adapter: Better vision-language models with feature adapters

Svl-adapter: Self-supervised adapter for vision-language pretrained models

Bayesian prompt learning for image-language model generalization

Unsupervised prompt learning for vision-language models

Learning to decompose visual features with latent textual prompts

Debiasing vision-language models via biased prompts

What does clip know about a red circle? visual prompt engineering for vlms

Logoprompt: Synthetic text images can be good visual prompts for vision-language models

Conditional prompt learning for vision-language models

相关搜索

引用