Adaptformer: Adapting vision transformers for scalable visual recognition

J Wang, Z Liu, L Zhao, Z Wu, C Ma, S Yu, H Dai… - Meta-Radiology, 2023 - Elsevier

Visual prompt engineering is a fundamental methodology in the field of visual and image
artificial general intelligence. As the development of large vision models progresses, the …

被引用次数：68 相关文章所有 4 个版本

[PDF] arxiv.org

Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

被引用次数：48 相关文章所有 2 个版本

[PDF] arxiv.org

Vision transformer adapter for dense predictions

Z Chen, Y Duan, W Wang, J He, T Lu, J Dai… - arXiv preprint arXiv …, 2022 - arxiv.org

This work investigates a simple yet powerful adapter for Vision Transformer (ViT). Unlike
recent visual transformers that introduce vision-specific inductive biases into their …

被引用次数：413 相关文章所有 3 个版本

[PDF] arxiv.org

Medical sam adapter: Adapting segment anything model for medical image segmentation

J Wu, W Ji, Y Liu, H Fu, M Xu, Y Xu, Y Jin - arXiv preprint arXiv:2304.12620, 2023 - arxiv.org

The Segment Anything Model (SAM) has recently gained popularity in the field of image
segmentation due to its impressive capabilities in various segmentation tasks and its prompt …

被引用次数：237 相关文章所有 3 个版本

[PDF] thecvf.com

Visual prompt multi-modal tracking

J Zhu, S Lai, X Chen, D Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Visible-modal object tracking gives rise to a series of downstream multi-modal tracking
tributaries. To inherit the powerful representations of the foundation model, a natural modus …

被引用次数：108 相关文章所有 6 个版本

[PDF] neurips.cc

Scaling & shifting your features: A new baseline for efficient model tuning

D Lian, D Zhou, J Feng, X Wang - Advances in Neural …, 2022 - proceedings.neurips.cc

Existing fine-tuning methods either tune all parameters of the pre-trained model (full fine-
tuning), which is not efficient, or only tune the last linear layer (linear probing), which suffers …

被引用次数：137 相关文章所有 6 个版本

[PDF] arxiv.org

Deep class-incremental learning: A survey

DW Zhou, QW Wang, ZH Qi, HJ Ye, DC Zhan… - arXiv preprint arXiv …, 2023 - arxiv.org

Deep models, eg, CNNs and Vision Transformers, have achieved impressive achievements
in many vision tasks in the closed world. However, novel classes emerge from time to time in …

被引用次数：147 相关文章所有 7 个版本

[PDF] thecvf.com

Simda: Simple diffusion adapter for efficient video generation

Z Xing, Q Dai, H Hu, Z Wu… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

The recent wave of AI-generated content has witnessed the great development and success
of Text-to-Image (T2I) technologies. By contrast Text-to-Video (T2V) still falls short of …

被引用次数：33 相关文章所有 3 个版本

[PDF] thecvf.com

Explicit visual prompting for low-level structure segmentations

W Liu, X Shen, CM Pun, X Cun - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

We consider the generic problem of detecting low-level structures in images, which includes
segmenting the manipulated parts, identifying out-of-focus pixels, separating shadow …

被引用次数：64 相关文章所有 6 个版本

[PDF] neurips.cc

Cheap and quick: Efficient vision-language instruction tuning for large language models

G Luo, Y Zhou, T Ren, S Chen… - Advances in Neural …, 2024 - proceedings.neurips.cc

Recently, growing interest has been aroused in extending the multimodal capability of large
language models (LLMs), eg, vision-language (VL) learning, which is regarded as the next …

被引用次数：58 相关文章所有 6 个版本

高级搜索

QQ 群