- 学术资源搜索

What does a platypus look like? generating customized prompts for zero-shot image classification

S Pratt, I Covert, R Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Open-vocabulary models are a promising new paradigm for image classification. Unlike
traditional classification models, open-vocabulary models classify among any arbitrary set of …

被引用次数：225 相关文章所有 7 个版本

[PDF] neurips.cc

Improving multimodal datasets with image captioning

T Nguyen, SY Gadre, G Ilharco… - Advances in Neural …, 2024 - proceedings.neurips.cc

Massive web datasets play a key role in the success of large vision-language models like
CLIP and Flamingo. However, the raw web data is noisy, and existing filtering methods to …

被引用次数：61 相关文章所有 6 个版本

[PDF] arxiv.org

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Z Sha, Z Li, N Yu, Y Zhang - Proceedings of the 2023 ACM SIGSAC …, 2023 - dl.acm.org

Text-to-image generation models that generate images based on prompt descriptions have
attracted an increasing amount of attention during the past few months. Despite their …

被引用次数：145 相关文章所有 6 个版本

[PDF] thecvf.com

Verbs in action: Improving verb understanding in video-language models

L Momeni, M Caron, A Nagrani… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding verbs is crucial to modelling how people and objects interact with each other
and the environment through space and time. Recently, state-of-the-art video-language …

被引用次数：65 相关文章所有 6 个版本

[PDF] thecvf.com

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

被引用次数：93 相关文章所有 5 个版本

[PDF] arxiv.org

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Y Qu, X Shen, X He, M Backes, S Zannettou… - Proceedings of the 2023 …, 2023 - dl.acm.org

State-of-the-art Text-to-Image models like Stable Diffusion and DALLE\cdot2 are
revolutionizing how people generate visual content. At the same time, society has serious …

被引用次数：90 相关文章所有 8 个版本

[PDF] arxiv.org

Trak: Attributing model behavior at scale

SM Park, K Georgiev, A Ilyas, G Leclerc… - arXiv preprint arXiv …, 2023 - arxiv.org

The goal of data attribution is to trace model predictions back to training data. Despite a long
line of work towards this goal, existing approaches to data attribution tend to force users to …

被引用次数：117 相关文章所有 5 个版本

[PDF] neurips.cc

Dense and aligned captions (dac) promote compositional reasoning in vl models

S Doveh, A Arbelle, S Harary… - Advances in …, 2023 - proceedings.neurips.cc

Vision and Language (VL) models offer an effective method for aligning representation
spaces of images and text allowing for numerous applications such as cross-modal retrieval …

被引用次数：40 相关文章所有 7 个版本

[PDF] thecvf.com

Flip: Cross-domain face anti-spoofing with language guidance

K Srivatsan, M Naseer… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Face anti-spoofing (FAS) or presentation attack detection is an essential component of face
recognition systems deployed in security-critical applications. Existing FAS methods have …

被引用次数：39 相关文章所有 7 个版本

[PDF] thecvf.com

Bioclip: A vision foundation model for the tree of life

S Stevens, J Wu, MJ Thompson… - Proceedings of the …, 2024 - openaccess.thecvf.com

Images of the natural world collected by a variety of cameras from drones to individual
phones are increasingly abundant sources of biological information. There is an explosion …

被引用次数：46 相关文章所有 4 个版本

高级搜索

QQ 群

What does a platypus look like? generating customized prompts for zero-shot image classification

Improving multimodal datasets with image captioning

De-fake: Detection and attribution of fake images generated by text-to-image generation models

Verbs in action: Improving verb understanding in video-language models

Sus-x: Training-free name-only transfer of vision-language models

Unsafe diffusion: On the generation of unsafe images and hateful memes from text-to-image models

Trak: Attributing model behavior at scale

Dense and aligned captions (dac) promote compositional reasoning in vl models

Flip: Cross-domain face anti-spoofing with language guidance

Bioclip: A vision foundation model for the tree of life

引用