Plot: Prompt learning with optimal transport for vision-language models

X Li, D Lian, Z Lu, J Bai, Z Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning
of vision-language models (VLMs) under the low-data regime, where only a few additional …

被引用次数：52 相关文章所有 5 个版本

[PDF] thecvf.com

Improving zero-shot generalization for clip with synthesized prompts

Z Wang, J Liang, R He, N Xu… - Proceedings of the …, 2023 - openaccess.thecvf.com

With the growing interest in pretrained vision-language models like CLIP, recent research
has focused on adapting these models to downstream tasks. Despite achieving promising …

被引用次数：43 相关文章所有 3 个版本

[PDF] arxiv.org

Generalized out-of-distribution detection and beyond in vision language model era: A survey

A Miyai, J Yang, J Zhang, Y Ming, Y Lin, Q Yu… - arXiv preprint arXiv …, 2024 - arxiv.org

Detecting out-of-distribution (OOD) samples is crucial for ensuring the safety of machine
learning systems and has shaped the field of OOD detection. Meanwhile, several other …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

When Geoscience Meets Foundation Models: Toward a general geoscience artificial intelligence system

H Zhang, JJ Xu, HW Cui, L Li, Y Yang… - … and Remote Sensing …, 2024 - ieeexplore.ieee.org

Artificial intelligence (AI) has significantly advanced Earth sciences, yet its full potential in to
comprehensively modeling Earth's complex dynamics remains unrealized. Geoscience …

被引用次数：7 相关文章所有 2 个版本

[PDF] thecvf.com

A closer look at the few-shot adaptation of large vision-language models

J Silva-Rodriguez, S Hajimiri… - Proceedings of the …, 2024 - openaccess.thecvf.com

Efficient transfer learning (ETL) is receiving increasing attention to adapt large pre-trained
language-vision models on downstream tasks with a few labeled samples. While significant …

被引用次数：17 相关文章所有 4 个版本

[PDF] springer.com

Few-shot adaptation of multi-modal foundation models: A survey

F Liu, T Zhang, W Dai, C Zhang, W Cai, X Zhou… - Artificial Intelligence …, 2024 - Springer

Abstract Multi-modal (vision-language) models, such as CLIP, are replacing traditional
supervised pre-training models (eg, ImageNet-based pre-training) as the new generation of …

被引用次数：16 相关文章所有 4 个版本

[PDF] arxiv.org

Uncertainty-aware sign language video retrieval with probability distribution modeling

X Wu, H Li, Y Luo, X Cheng, X Zhuang, M Cao… - European Conference on …, 2025 - Springer

Sign language video retrieval plays a key role in facilitating information access for the deaf
community. Despite significant advances in video-text retrieval, the complexity and inherent …

被引用次数：8 相关文章所有 2 个版本

[PDF] neurips.cc

LICO: explainable models with language-image consistency

Y Lei, Z Li, Y Li, J Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc

Interpreting the decisions of deep learning models has been actively studied since the
explosion of deep neural networks. One of the most convincing interpretation approaches is …

被引用次数：7 相关文章所有 5 个版本

[PDF] thecvf.com

On the test-time zero-shot generalization of vision-language models: Do we really need prompt learning?

M Zanella, I Ben Ayed - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

The development of large vision-language models notably CLIP has catalyzed research into
effective adaptation techniques with a particular focus on soft prompt tuning. Conjointly test …

被引用次数：13 相关文章所有 3 个版本

[PDF] thecvf.com

Low-Rank Few-Shot Adaptation of Vision-Language Models

M Zanella, I Ben Ayed - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Recent progress in the few-shot adaptation of Vision-Language Models (VLMs) has further
pushed their generalization capabilities at the expense of just a few labeled samples within …

被引用次数：10 相关文章所有 3 个版本

高级搜索

QQ 群