Graphadapter: Tuning vision-language models with dual knowledge graph

X Li, D Lian, Z Lu, J Bai, Z Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc
Adapter-style efficient transfer learning (ETL) has shown excellent performance in the tuning
of vision-language models (VLMs) under the low-data regime, where only a few additional …

Dual memory networks: A versatile adaptation approach for vision-language models

Y Zhang, W Zhu, H Tang, Z Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the emergence of pre-trained vision-language models like CLIP how to adapt them to
various downstream classification tasks has garnered significant attention in recent …

Mope-clip: Structured pruning for efficient vision-language models with module-wise pruning error metric

H Lin, H Bai, Z Liu, L Hou, M Sun… - Proceedings of the …, 2024 - openaccess.thecvf.com
Vision-language pre-trained models have achieved impressive performance on various
downstream tasks. However their large model sizes hinder their utilization on platforms with …

Gradient-based Parameter Selection for Efficient Fine-Tuning

Z Zhang, Q Zhang, Z Gao, R Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
With the growing size of pre-trained models full fine-tuning and storing all the parameters for
various downstream tasks is costly and infeasible. In this paper we propose a new …

Ta-Adapter: Enhancing few-shot CLIP with task-aware encoders

W Zhang, Y Zhang, Y Deng, W Zhang, J Lin, B Huang… - Pattern Recognition, 2024 - Elsevier
Abstract Contrastive Language-Image Pre-training (CLIP) has shown impressive zero-shot
transfer capabilities, but its potential for specific downstream tasks is not fully utilized. To …

Referred by multi-modality: A unified temporal transformer for video object segmentation

S Yan, R Zhang, Z Guo, W Chen, W Zhang… - Proceedings of the …, 2024 - ojs.aaai.org
Recently, video object segmentation (VOS) referred by multi-modal signals, eg, language
and audio, has evoked increasing attention in both industry and academia. It is challenging …

Label Propagation for Zero-shot Classification with Vision-Language Models

Y Kalantidis, G Tolias - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Vision-Language Models (VLMs) have demonstrated impressive performance on
zero-shot classification ie classification when provided merely with a list of class names. In …

Meta-Adapter: An Online Few-shot Learner for Vision-Language Model

L Song, R Xue, H Wang, H Sun… - Advances in Neural …, 2024 - proceedings.neurips.cc
The contrastive vision-language pre-training, known as CLIP, demonstrates remarkable
potential in perceiving open-world visual concepts, enabling effective zero-shot image …

DeIL: Direct-and-Inverse CLIP for Open-World Few-Shot Learning

S Shao, Y Bai, Y Wang, B Liu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Abstract Open-World Few-Shot Learning (OFSL) is a critical field of research concentrating
on the precise identification of target samples in environments with scarce data and …

FD-Align: Feature Discrimination Alignment for Fine-tuning Pre-Trained Models in Few-Shot Learning

K Song, H Ma, B Zou, H Zhang… - Advances in Neural …, 2024 - proceedings.neurips.cc
Due to the limited availability of data, existing few-shot learning methods trained from
scratch fail to achieve satisfactory performance. In contrast, large-scale pre-trained models …