Recent works have shown that unstructured text (documents) from online sources can serve as useful auxiliary information for zero-shot image classification. However, these methods …
S Wang, L Song, R Shimizu, M Goto - Synthetic Data for Computer Vision … - openreview.net
Zero-shot image classification is a challenging task aiming to classify real images without real training examples. Recent research has employed synthetic training images generated …
O Saha, G Van Horn, S Maji - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
The zero-shot performance of existing vision-language models (VLMs) such as CLIP is limited by the availability of large-scale aligned image and text datasets in specific domains …
Natural language supervision in the form of image captions was recently shown to be an effective way of training zero-shot image classification models. In this work, we focus on …
Y Kalantidis, G Tolias - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Vision-Language Models (VLMs) have demonstrated impressive performance on zero-shot classification ie classification when provided merely with a list of class names. In …
The application of zero-shot learning in computer vision has been revolutionized by the use of image-text matching models. The most notable example, CLIP, has been widely used for …
Using natural language as a supervision for training visual recognition models holds great promise. Recent works have shown that if such supervision is used in the form of alignment …
The synergy of language and vision models has given rise to Large Language and Vision Assistant models (LLVAs), designed to engage users in rich conversational experiences …
Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing methods still rely on human-annotated attributes, which are difficult to annotate and scale …