Adaptformer: Adapting vision transformers for scalable visual recognition

Y Xin, J Du, Q Wang, Z Lin, K Yan - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

Large-scale pre-trained models have achieved remarkable success in various computer
vision tasks. A standard approach to leverage these models is to fine-tune all model …

被引用次数：19 相关文章所有 3 个版本

[PDF] arxiv.org

Parameter-efficient fine-tuning for large models: A comprehensive survey

Z Han, C Gao, J Liu, SQ Zhang - arXiv preprint arXiv:2403.14608, 2024 - arxiv.org

Large models represent a groundbreaking advancement in multiple application fields,
enabling remarkable achievements across various tasks. However, their unprecedented …

被引用次数：33 相关文章所有 2 个版本

[PDF] thecvf.com

Stronger Fewer & Superior: Harnessing Vision Foundation Models for Domain Generalized Semantic Segmentation

Z Wei, L Chen, Y Jin, X Ma, T Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we first assess and harness various Vision Foundation Models (VFMs) in the
context of Domain Generalized Semantic Segmentation (DGSS). Driven by the motivation …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Sensitivity-aware visual parameter-efficient fine-tuning

H He, J Cai, J Zhang, D Tao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Visual Parameter-Efficient Fine-Tuning (PEFT) has become a powerful alternative
for full fine-tuning so as to adapt pre-trained vision models to downstream tasks, which only …

被引用次数：27 相关文章所有 7 个版本

[PDF] thecvf.com

Fame-vil: Multi-tasking vision-language model for heterogeneous fashion tasks

X Han, X Zhu, L Yu, L Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com

In the fashion domain, there exists a variety of vision-and-language (V+ L) tasks, including
cross-modal retrieval, text-guided image retrieval, multi-modal classification, and image …

被引用次数：18 相关文章所有 8 个版本

[PDF] arxiv.org

Towards efficient visual adaption via structural re-parameterization

G Luo, M Huang, Y Zhou, X Sun, G Jiang… - arXiv preprint arXiv …, 2023 - arxiv.org

Parameter-efficient transfer learning (PETL) is an emerging research spot aimed at
inexpensively adapting large-scale pre-trained models to downstream tasks. Recent …

被引用次数：37 相关文章所有 2 个版本

[PDF] thecvf.com

Dual-path adaptation from image to video transformers

J Park, J Lee, K Sohn - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

In this paper, we efficiently transfer the surpassing representation power of the vision
foundation models, such as ViT and Swin, for video understanding with only a few trainable …

被引用次数：17 相关文章所有 6 个版本

[PDF] thecvf.com

Vl-pet: Vision-and-language parameter-efficient tuning via granularity control

ZY Hu, Y Li, MR Lyu, L Wang - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

As the model size of pre-trained language models (PLMs) grows rapidly, full fine-tuning
becomes prohibitively expensive for model training and storage. In vision-and-language …

被引用次数：9 相关文章所有 5 个版本

[PDF] arxiv.org

Parameter-efficient fine-tuning for pre-trained vision models: A survey

Y Xin, S Luo, H Zhou, J Du, X Liu, Y Fan, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Large-scale pre-trained vision models (PVMs) have shown great potential for adaptability
across various downstream vision tasks. However, with state-of-the-art PVMs growing to …

被引用次数：21 相关文章所有 2 个版本

[PDF] thecvf.com

Forgery-aware adaptive transformer for generalizable synthetic image detection

H Liu, Z Tan, C Tan, Y Wei, J Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

In this paper we study the problem of generalizable synthetic image detection aiming to
detect forgery images from diverse generative methods eg GANs and diffusion models …

被引用次数：4 相关文章所有 3 个版本

高级搜索

QQ 群