Unified 3d segmenter as prototypical classifiers

Z Qin, C Han, Q Wang, X Nie, Y Yin… - Advances in Neural …, 2023 - proceedings.neurips.cc
The task of point cloud segmentation, comprising semantic, instance, and panoptic
segmentation, has been mainly tackled by designing task-specific network architectures …

Promptkd: Unsupervised prompt distillation for vision-language models

Z Li, X Li, X Fu, X Zhang, W Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Prompt learning has emerged as a valuable technique in enhancing vision-language
models (VLMs) such as CLIP for downstream tasks in specific domains. Existing work mainly …

Aprompt: Attention prompt tuning for efficient adaptation of pre-trained language models

Q Wang, Y Mao, J Wang, H Yu, S Nie… - Proceedings of the …, 2023 - aclanthology.org
With the continuous growth of large language models, the process of fine-tuning these
models for new tasks has become increasingly parameter-intensive. Prompt tuning, a …

Image translation as diffusion visual programmers

C Han, JC Liang, Q Wang, M Rabbani, S Dianat… - arXiv preprint arXiv …, 2024 - arxiv.org
We introduce the novel Diffusion Visual Programmer (DVP), a neuro-symbolic image
translation framework. Our proposed DVP seamlessly embeds a condition-flexible diffusion …

[HTML][HTML] Ov-vg: A benchmark for open-vocabulary visual grounding

C Wang, W Feng, X Li, G Cheng, S Lyu, B Liu, L Chen… - Neurocomputing, 2024 - Elsevier
Open-vocabulary learning has emerged as a cutting-edge research area, particularly in light
of the widespread adoption of vision-based foundational models. Its primary objective is to …

Facing the Elephant in the Room: Visual Prompt Tuning or Full Finetuning?

C Han, Q Wang, Y Cui, W Wang, L Huang, S Qi… - arXiv preprint arXiv …, 2024 - arxiv.org
As the scale of vision models continues to grow, the emergence of Visual Prompt Tuning
(VPT) as a parameter-efficient transfer learning technique has gained attention due to its …

The improved YOLOv8 algorithm based on EMSPConv and SPE-head modules

G Wen, M Li, Y Luo, C Shi, Y Tan - Multimedia Tools and Applications, 2024 - Springer
Addressing the challenges of high model complexity, low generalization capability, and
suboptimal detection performance in most algorithms for crop leaf disease detection, the …

Efficient multimodal semantic segmentation via dual-prompt learning

S Dong, Y Feng, Q Yang, Y Huang, D Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Multimodal (eg, RGB-Depth/RGB-Thermal) fusion has shown great potential for improving
semantic segmentation in complex scenes (eg, indoor/low-light conditions). Existing …

ViLa-MIL: Dual-scale Vision-Language Multiple Instance Learning for Whole Slide Image Classification

J Shi, C Li, T Gong, Y Zheng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Multiple instance learning (MIL)-based framework has become the mainstream for
processing the whole slide image (WSI) with giga-pixel size and hierarchical image context …

Unsupervised Domain Adaption Harnessing Vision-Language Pre-training

W Zhou, Z Zhou - IEEE Transactions on Circuits and Systems …, 2024 - ieeexplore.ieee.org
This paper addresses two vital challenges in Unsupervised Domain Adaptation (UDA) with a
focus on harnessing the power of Vision-Language Pre-training (VLP) models. Firstly, UDA …