Vmt-adapter: Parameter-efficient transfer learning for multi-task dense scene understanding

Y Xin, J Du, Q Wang, Z Lin, K Yan - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
Large-scale pre-trained models have achieved remarkable success in various computer
vision tasks. A standard approach to leverage these models is to fine-tune all model …

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org
Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Z Wei, L Chen, Y Jin, X Ma, T Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we first assess and harness various Vision Foundation Models (VFMs) in the
context of Domain Generalized Semantic Segmentation (DGSS). Driven by the motivation …

Sensitivity-aware visual parameter-efficient fine-tuning

H He, J Cai, J Zhang, D Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative
for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only …

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

Towards efficient visual adaption via structural re-parameterization

G Luo, M Huang, Y Zhou, X Sun, G Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org
Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at
inexpensively adapting large-scale pre-trained models to downstream tasks. Recent …

Dual-path adaptation from image to video transformers

J Park, J Lee, K Sohn - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
In this paper, we efficiently transfer the surpassing representation power of the vision
foundation models, such as ViT and Swin, for video understanding with only a few trainable …

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

Parameter-efficient fine-tuning for pre-trained vision models: A survey

Y Xin, S Luo, H Zhou, J Du, X Liu, Y Fan, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability
across various downstream vision tasks. However, with state-of-the-art PVMs growing to …

Forgery-aware adaptive transformer for generalizable synthetic image detection

H Liu, Z Tan, C Tan, Y Wei, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
In this paper we study the problem of generalizable synthetic image detection aiming to
detect forgery images from diverse generative methods eg GANs and diffusion models …