Graphadapter: Tuning vision-language models with dual knowledge graph

X Li, D Lian, Z Lu, J Bai, Z Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning
of vision-language models (VLMs) under the low-data regime, where only a few additional …

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com
With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

Generalized out-of-distribution detection and beyond in vision language model era: A survey

A Miyai, J Yang, J Zhang, Y Ming, Y Lin, Q Yu… - arXiv preprint arXiv …, 2024 - arxiv.org
Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine
learning systems and has shaped the field of OOD detection. Meanwhile, several other …

When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system

H Zhang, JJ Xu, HW Cui, L Li, Y Yang… - … and Remote Sensing …, 2024 - ieeexplore.ieee.org
Artificial intelligence (AI) has significantly advanced Earth sciences, yet its full potential in to
comprehensively modeling Earth's complex dynamics remains unrealized. Geoscience …

A closer look at the few-shot adaptation of large vision-language models

J Silva-Rodriguez, S Hajimiri… - Proceedings of the …, 2024 - openaccess.thecvf.com
Efficient transfer learning (ETL) is receiving increasing attention to adapt large pre-trained
language-vision models on downstream tasks with a few labeled samples. While significant …

Few-shot adaptation of multi-modal foundation models: A survey

F Liu, T Zhang, W Dai, C Zhang, W Cai, X Zhou… - Artificial Intelligence …, 2024 - Springer
Abstract Multi-modal (vision-language) models, such as CLIP, are replacing traditional
supervised pre-training models (eg, ImageNet-based pre-training) as the new generation of …

Uncertainty-aware sign language video retrieval with probability distribution modeling

X Wu, H Li, Y Luo, X Cheng, X Zhuang, M Cao… - European Conference on …, 2025 - Springer
Sign language video retrieval plays a key role in facilitating information access for the deaf
community. Despite significant advances in video-text retrieval, the complexity and inherent …

LICO: explainable models with language-image consistency

Y Lei, Z Li, Y Li, J Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Interpreting the decisions of deep learning models has been actively studied since the
explosion of deep neural networks. One of the most convincing interpretation approaches is …

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

M Zanella, I Ben Ayed - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
The development of large vision-language models notably CLIP has catalyzed research into
effective adaptation techniques with a particular focus on soft prompt tuning. Conjointly test …

Low-Rank Few-Shot Adaptation of Vision-Language Models

M Zanella, I Ben Ayed - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further
pushed their generalization capabilities at the expense of just a few labeled samples within …