Why Is Prompt Tuning for Vision-Language Models Robust to Noisy Labels?

CE Wu, Y Tian, H Yu, H Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Vision-language models such as CLIP learn a generic text-image embedding from large-
scale training data. A vision-language model can be adapted to a new classification task …

Clip-adapter: Better vision-language models with feature adapters

P Gao, S Geng, R Zhang, T Ma, R Fang… - International Journal of …, 2024 - Springer
Large-scale contrastive vision-language pretraining has shown significant progress in visual
representation learning. Unlike traditional visual systems trained by a fixed set of discrete …

Svl-adapter: Self-supervised adapter for vision-language pretrained models

O Pantazis, G Brostow, K Jones… - arXiv preprint arXiv …, 2022 - arxiv.org
Vision-language models such as CLIP are pretrained on large volumes of internet sourced
image and text pairs, and have been shown to sometimes exhibit impressive zero-and low …

Bayesian prompt learning for image-language model generalization

MM Derakhshani, E Sanchez, A Bulat… - Proceedings of the …, 2023 - openaccess.thecvf.com
Foundational image-language models have generated considerable interest due to their
efficient adaptation to downstream tasks by prompt learning. Prompt learning treats part of …

Unsupervised prompt learning for vision-language models

T Huang, J Chu, F Wei - arXiv preprint arXiv:2204.03649, 2022 - arxiv.org
Contrastive vision-language models like CLIP have shown great progress in transfer
learning. In the inference stage, the proper text description, also known as prompt, needs to …

Learning to decompose visual features with latent textual prompts

F Wang, M Li, X Lin, H Lv, AG Schwing, H Ji - arXiv preprint arXiv …, 2022 - arxiv.org
Recent advances in pre-training vision-language models like CLIP have shown great
potential in learning transferable visual representations. Nonetheless, for downstream …

Debiasing vision-language models via biased prompts

CY Chuang, V Jampani, Y Li, A Torralba… - arXiv preprint arXiv …, 2023 - arxiv.org
Machine learning models have been shown to inherit biases from their training datasets.
This can be particularly problematic for vision-language foundation models trained on …

What does clip know about a red circle? visual prompt engineering for vlms

A Shtedritski, C Rupprecht… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Large-scale Vision-Language Models, such as CLIP, learn powerful image-text
representations that have found numerous applications, from zero-shot classification to text …

Logoprompt: Synthetic text images can be good visual prompts for vision-language models

C Shi, S Yang - Proceedings of the IEEE/CVF International …, 2023 - openaccess.thecvf.com
Prompt engineering is a powerful tool used to enhance the performance of pre-trained
models on downstream tasks. For example, providing the prompt" Let's think step by step" …

Conditional prompt learning for vision-language models

K Zhou, J Yang, CC Loy, Z Liu - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
With the rise of powerful pre-trained vision-language models like CLIP, it becomes essential
to investigate ways to adapt these models to downstream datasets. A recently proposed …