Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

O Saha, G Van Horn, S Maji - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
The zero-shot performance of existing vision-language models (VLMs) such as CLIP is
limited by the availability of large-scale aligned image and text datasets in specific domains …

Filo: Zero-shot anomaly detection by fine-grained description and high-quality localization

Z Gu, B Zhu, G Zhu, Y Chen, H Li, M Tang… - Proceedings of the 32nd …, 2024 - dl.acm.org
Zero-shot anomaly detection (ZSAD) methods detect anomalies without prior access to
known normal or abnormal samples within target categories. Existing methods typically rely …

Multi-modal attribute prompting for vision-language models

X Liu, J Wu, W Yang, X Zhou… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Pre-trained Vision-Language Models (VLMs), like CLIP, exhibit strong generalization ability
to downstream tasks but struggle in few-shot scenarios. Existing prompting techniques …

Multimodal foundation models for zero-shot animal species recognition in camera trap images

Z Fabian, Z Miao, C Li, Y Zhang, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Due to deteriorating environmental conditions and increasing human activity, conservation
efforts directed towards wildlife is crucial. Motion-activated camera traps constitute an …

Zero-shot ecg classification with multimodal learning and test-time clinical knowledge enhancement

C Liu, Z Wan, C Ouyang, A Shah, W Bai… - arXiv preprint arXiv …, 2024 - arxiv.org
Electrocardiograms (ECGs) are non-invasive diagnostic tools crucial for detecting cardiac
arrhythmic diseases in clinical practice. While ECG Self-supervised Learning (eSSL) …

Prompting language-informed distribution for compositional zero-shot learning

W Bao, L Chen, H Huang, Y Kong - arXiv preprint arXiv:2305.14428, 2023 - arxiv.org
Compositional zero-shot learning (CZSL) task aims to recognize unseen compositional
visual concepts, eg, sliced tomatoes, where the model is learned only from the seen …

Zero-Shot Robustification of Zero-Shot Models

D Adila, C Shin, L Cai, F Sala - The Twelfth International …, 2024 - openreview.net
Zero-shot inference is a powerful paradigm that enables the use of large pretrained models
for downstream classification tasks without further training. However, these models are …

MCGMark: An Encodable and Robust Online Watermark for LLM-Generated Malicious Code

K Ning, J Chen, Q Zhong, T Zhang, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
With the advent of large language models (LLMs), numerous software service providers
(SSPs) are dedicated to developing LLMs customized for code generation tasks, such as …

Scene Graph Generation with Role-Playing Large Language Models

G Chen, J Li, W Wang - arXiv preprint arXiv:2410.15364, 2024 - arxiv.org
Current approaches for open-vocabulary scene graph generation (OVSGG) use vision-
language models such as CLIP and follow a standard zero-shot pipeline--computing …

Test-Time Adaptation with SaLIP: A Cascade of SAM and CLIP for Zero-shot Medical Image Segmentation

S Aleem, F Wang, M Maniparambil… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract The Segment Anything Model (SAM) and CLIP are remarkable vision foundation
models (VFMs). SAM a prompt-driven segmentation model excels in segmentation tasks …