Enhancing clip with gpt-4: Harnessing visual descriptions as prompts

O Saha, G Van Horn, S Maji - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

The zero-shot performance of existing vision-language models (VLMs) such as CLIP is
limited by the availability of large-scale aligned image and text datasets in specific domains …

被引用次数：7 相关文章所有 3 个版本

[PDF] acm.org

Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization

Z Gu, B Zhu, G Zhu, Y Chen, H Li, M Tang… - Proceedings of the 32nd …, 2024 - dl.acm.org

Zero-shot anomaly detection (ZSAD) methods detect anomalies without prior access to
known normal or abnormal samples within target categories. Existing methods typically rely …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

Multi-modal attribute prompting for vision-language models

X Liu, J Wu, W Yang, X Zhou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

Pre-trained Vision-Language Models (VLMs), like CLIP, exhibit strong generalization ability
to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

Multimodal foundation models for zero-shot animal species recognition in camera trap images

Z Fabian, Z Miao, C Li, Y Zhang, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

Due to deteriorating environmental conditions and increasing human activity, conservation
efforts directed towards wildlife is crucial. Motion-activated camera traps constitute an …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

C Liu, Z Wan, C Ouyang, A Shah, W Bai… - arXiv preprint arXiv …, 2024 - arxiv.org

Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac
arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) …

被引用次数：16 相关文章所有 3 个版本

[PDF] arxiv.org

Prompting language-informed distribution for compositional zero-shot learning

W Bao, L Chen, H Huang, Y Kong - arXiv preprint arXiv:2305.14428, 2023 - arxiv.org

Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional
visual concepts, eg, sliced tomatoes, where the model is learned only from the seen …

被引用次数：11 相关文章所有 3 个版本

[PDF] openreview.net

Zero-Shot Robustification of Zero-Shot Models

D Adila, C Shin, L Cai, F Sala - The Twelfth International …, 2024 - openreview.net

Zero-shot inference is a powerful paradigm that enables the use of large pretrained models
for downstream classification tasks without further training. However, these models are …

被引用次数：4 相关文章

[PDF] arxiv.org

MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code

K Ning, J Chen, Q Zhong, T Zhang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

With the advent of large language models (LLMs), numerous software service providers
(SSPs) are dedicated to developing LLMs customized for code generation tasks, such as …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

Scene Graph Generation with Role-Playing Large Language Models

G Chen, J Li, W Wang - arXiv preprint arXiv:2410.15364, 2024 - arxiv.org

Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-
language models such as CLIP and follow a standard zero-shot pipeline--computing …

被引用次数：1 相关文章所有 3 个版本

[PDF] thecvf.com

Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation

S Aleem, F Wang, M Maniparambil… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract The Segment Anything Model (SAM) and CLIP are remarkable vision foundation
models (VFMs). SAM a prompt-driven segmentation model excels in segmentation tasks …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群