Visual tuning

BXB Yu, J Chang, H Wang, L Liu, S Wang… - ACM Computing …, 2024 - dl.acm.org
Fine-tuning visual models has been widely shown promising performance on many
downstream visual tasks. With the surprising development of pre-trained visual foundation …

A systematic survey of prompt engineering on vision-language foundation models

J Gu, Z Han, S Chen, A Beirami, B He, G Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Prompt engineering is a technique that involves augmenting a large pre-trained model with
task-specific hints, known as prompts, to adapt the model to new tasks. Prompts can be …

Waffling around for performance: Visual classification with random words and broad concepts

K Roth, JM Kim, A Koepke, O Vinyals… - Proceedings of the …, 2023 - openaccess.thecvf.com
The visual classification performance of vision-language models such as CLIP has been
shown to benefit from additional semantic knowledge from large language models (LLMs) …

Understanding and improving visual prompting: A label-mapping perspective

A Chen, Y Yao, PY Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
We revisit and advance visual prompting (VP), an input prompting technique for vision tasks.
VP can reprogram a fixed, pre-trained source model to accomplish downstream tasks in the …

Universal prompt tuning for graph neural networks

T Fang, Y Zhang, Y Yang, C Wang… - Advances in Neural …, 2024 - proceedings.neurips.cc
In recent years, prompt tuning has sparked a research surge in adapting pre-trained models.
Unlike the unified pre-training strategy employed in the language field, the graph field …

Few-shot adaptation of multi-modal foundation models: A survey

F Liu, T Zhang, W Dai, C Zhang, W Cai, X Zhou… - Artificial Intelligence …, 2024 - Springer
Abstract Multi-modal (vision-language) models, such as CLIP, are replacing traditional
supervised pre-training models (eg, ImageNet-based pre-training) as the new generation of …

Parameter-efficient fine-tuning for pre-trained vision models: A survey

Y Xin, S Luo, H Zhou, J Du, X Liu, Y Fan, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability
across various downstream vision tasks. However, with state-of-the-art PVMs growing to …

Not all prompts are secure: A switchable backdoor attack against pre-trained vision transfomers

S Yang, J Bai, K Gao, Y Yang, Y Li… - Proceedings of the …, 2024 - openaccess.thecvf.com
Given the power of vision transformers a new learning paradigm pre-training and then
prompting makes it more efficient and effective to address downstream visual recognition …

Prompt guided transformer for multi-task dense prediction

Y Lu, S Sirejiding, Y Ding, C Wang… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Task-conditional architecture offers advantage in parameter efficiency but falls short in
performance compared to state-of-the-art multi-decoder methods. How to trade off …

Exploring the Transferability of Visual Prompting for Multimodal Large Language Models

Y Zhang, Y Dong, S Zhang, T Min… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Although Multimodal Large Language Models (MLLMs) have demonstrated
promising versatile capabilities their performance is still inferior to specialized models on …