The Neglected Tails in Vision-Language Models

S Parashar, Z Lin, T Liu, X Dong, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language models (VLMs) excel in zero-shot recognition but their performance varies
greatly across different visual concepts. For example although CLIP achieves impressive …

On catastrophic inheritance of large foundation models

H Chen, B Raj, X Xie, J Wang - arXiv preprint arXiv:2402.01909, 2024 - arxiv.org
Large foundation models (LFMs) are claiming incredible performances. Yet great concerns
have been raised about their mythic and uninterpreted potentials not only in machine …

Selective Vision-Language Subspace Projection for Few-shot CLIP

X Zhu, B Zhu, Y Tan, S Wang, Y Hao… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision-language models such as CLIP are capable of mapping the different modality data
into a unified feature space, enabling zero/few-shot inference by measuring the similarity of …

Generalization Beyond Data Imbalance: A Controlled Study on CLIP for Transferable Insights

X Wen, B Zhao, Y Chen, J Pang, X Qi - arXiv preprint arXiv:2405.21070, 2024 - arxiv.org
Severe data imbalance naturally exists among web-scale vision-language datasets. Despite
this, we find CLIP pre-trained thereupon exhibits notable robustness to the data imbalance …

Bias-Aware Low-Rank Adaptation: Mitigating Catastrophic Inheritance of Large Language Models

Y Chang, Y Chang, Y Wu - arXiv preprint arXiv:2408.04556, 2024 - arxiv.org
Large language models (LLMs) have exhibited remarkable proficiency across a diverse
array of natural language processing (NLP) tasks. However, adapting LLMs to downstream …