Embracing Diversity: Interpretable Zero-shot classification beyond one vector per class

M Moayeri, M Rabbat, M Ibrahim… - The 2024 ACM …, 2024 - dl.acm.org
Vision-language models enable open-world classification of objects without the need for any
retraining. While this zero-shot paradigm marks a significant advance, even today's best …

FairerCLIP: Debiasing CLIP's Zero-Shot Predictions using Functions in RKHSs

S Dehdashtian, L Wang, VN Boddeti - arXiv preprint arXiv:2403.15593, 2024 - arxiv.org
Large pre-trained vision-language models such as CLIP provide compact and general-
purpose representations of text and images that are demonstrably effective across multiple …

[PDF][PDF] Fairerclip: Debiasing zero-shot predictions of clip in rkhss

S Dehdashtian, L Wang, VN Boddeti - International Conference on …, 2024 - hal.cse.msu.edu
Large pre-trained vision-language models such as CLIP provide compact and general-
purpose representations of text and images that are demonstrably effective across multiple …

Learning to Prompt with Text Only Supervision for Vision-Language Models

MU Khattak, MF Naeem, M Naseer, L Van Gool… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundational vision-language models such as CLIP are becoming a new paradigm in
vision, due to their excellent generalization abilities. However, adapting these models for …

Invariant Test-Time Adaptation for Vision-Language Model Generalization

H Ma, Y Zhu, C Zhang, P Zhao, B Wu, LK Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language foundation models have exhibited remarkable success across a multitude
of downstream tasks due to their scalability on extensive image-text paired datasets …

BendVLM: Test-Time Debiasing of Vision-Language Embeddings

W Gerych, H Zhang, K Hamidieh, E Pan… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language model (VLM) embeddings have been shown to encode biases present in
their training data, such as societal biases that prescribe negative characteristics to …

OTTER: Effortless Label Distribution Adaptation of Zero-shot Models

C Shin, J Zhao, S Cromp, H Vishwakarma… - The Thirty-eighth …, 2024 - openreview.net
Popular zero-shot models suffer due to artifacts inherited from pretraining. One particularly
detrimental issue, caused by unbalanced web-scale pretraining data, is mismatched label …

DPA: Dual Prototypes Alignment for Unsupervised Adaptation of Vision-Language Models

E Ali, S Silva, MH Khan - arXiv preprint arXiv:2408.08855, 2024 - arxiv.org
Vision-language models (VLMs), eg, CLIP, have shown remarkable potential in zero-shot
image classification. However, adapting these models to new domains remains challenging …

CoAPT: Context Attribute words for Prompt Tuning

G Lee, S An, S Baik, S Lee - arXiv preprint arXiv:2407.13808, 2024 - arxiv.org
We propose a novel prompt tuning method called CoAPT (Context Attribute words in Prompt
Tuning) for few/zero-shot image classification. The core motivation is that attributes are …

OTTER: Improving Zero-Shot Classification via Optimal Transport

C Shin, J Zhao, S Cromp, H Vishwakarma… - arXiv preprint arXiv …, 2024 - arxiv.org
Popular zero-shot models suffer due to artifacts inherited from pretraining. A particularly
detrimental artifact, caused by unbalanced web-scale pretraining data, is mismatched label …