What does a platypus look like? generating customized prompts for zero-shot image classification

S Pratt, I Covert, R Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …

Improving multimodal datasets with image captioning

T Nguyen, SY Gadre, G Ilharco… - Advances in Neural …, 2024 - proceedings.neurips.cc
Massive web datasets play a key role in the success of large vision-language models like
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Z Sha, Z Li, N Yu, Y Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org
Text-to-image generation models that generate images based on prompt descriptions have
attracted an increasing amount of attention during the past few months. Despite their …

Verbs in action: Improving verb understanding in video-language models

L Momeni, M Caron, A Nagrani… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Y Qu, X Shen, X He, M Backes, S Zannettou… - Proceedings of the 2023 …, 2023 - dl.acm.org
State-of-the-art Text-to-Image models like Stable Diffusion and DALLE\cdot2 are
revolutionizing how people generate visual content. At the same time, society has serious …

Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arXiv preprint arXiv …, 2023 - arxiv.org
The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

Dense and aligned captions (dac) promote compositional reasoning in vl models

S Doveh, A Arbelle, S Harary… - Advances in …, 2023 - proceedings.neurips.cc
Vision and Language (VL) models offer an effective method for aligning representation
spaces of images and text allowing for numerous applications such as cross-modal retrieval …

Flip: Cross-domain face anti-spoofing with language guidance

K Srivatsan, M Naseer… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Face anti-spoofing (FAS) or presentation attack detection is an essential component of face
recognition systems deployed in security-critical applications. Existing FAS methods have …

Bioclip: A vision foundation model for the tree of life

S Stevens, J Wu, MJ Thompson… - Proceedings of the …, 2024 - openaccess.thecvf.com
Images of the natural world collected by a variety of cameras from drones to individual
phones are increasingly abundant sources of biological information. There is an explosion …