What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

A Abdelhamed, M Afifi, A Go - arXiv preprint arXiv:2405.15668, 2024 - arxiv.org
Large language models (LLMs) has been effectively used for many computer vision tasks,
including image classification. In this paper, we present a simple yet effective approach for …

I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification

MF Naeem, MGZA Khan, Y Xian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent works have shown that unstructured text (documents) from online sources can serve
as useful auxiliary information for zero-shot image classification. However, these methods …

Attributed Synthetic Data Generation for Zero-shot Image Classification

S Wang, L Song, R Shimizu, M Goto - Synthetic Data for Computer Vision … - openreview.net
Zero-shot image classification is a challenging task aiming to classify real images without
real training examples. Recent research has employed synthetic training images generated …

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

O Saha, G Van Horn, S Maji - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com
The zero-shot performance of existing vision-language models (VLMs) such as CLIP is
limited by the availability of large-scale aligned image and text datasets in specific domains …

[PDF][PDF] Can we train vision and language zero-shot classification models without syntax?

A Tejankar, M Sanjabi, B Wu, M Khabsa, S Xie… - … 2022 Workshop: Self …, 2022 - par.nsf.gov
Natural language supervision in the form of image captions was recently shown to be an
effective way of training zero-shot image classification models. In this work, we focus on …

Label Propagation for Zero-shot Classification with Vision-Language Models

Y Kalantidis, G Tolias - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Abstract Vision-Language Models (VLMs) have demonstrated impressive performance on
zero-shot classification ie classification when provided merely with a list of class names. In …

No token left behind: Explainability-aided image classification and generation

R Paiss, H Chefer, L Wolf - European Conference on Computer Vision, 2022 - Springer
The application of zero-shot learning in computer vision has been revolutionized by the use
of image-text matching models. The most notable example, CLIP, has been widely used for …

A fistful of words: Learning transferable visual models from bag-of-words supervision

A Tejankar, M Sanjabi, B Wu, S Xie, M Khabsa… - arXiv preprint arXiv …, 2021 - arxiv.org
Using natural language as a supervision for training visual recognition models holds great
promise. Recent works have shown that if such supervision is used in the form of alignment …

Pushing boundaries: Exploring zero shot object classification with large multimodal models

A Islam, MR Biswas, W Zaghouani… - … on Social Networks …, 2023 - ieeexplore.ieee.org
The synergy of language and vision models has given rise to Large Language and Vision
Assistant models (LLVAs), designed to engage users in rich conversational experiences …

I2dformer: Learning image to document attention for zero-shot image classification

MF Naeem, Y Xian, LV Gool… - Advances in Neural …, 2022 - proceedings.neurips.cc
Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing
methods still rely on human-annotated attributes, which are difficult to annotate and scale …