相关文章- 学术资源搜索

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

A Abdelhamed, M Afifi, A Go - arXiv preprint arXiv:2405.15668, 2024 - arxiv.org

Large language models (LLMs) has been effectively used for many computer vision tasks,
including image classification. In this paper, we present a simple yet effective approach for …

I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification

MF Naeem, MGZA Khan, Y Xian… - Proceedings of the …, 2023 - openaccess.thecvf.com

Recent works have shown that unstructured text (documents) from online sources can serve
as useful auxiliary information for zero-shot image classification. However, these methods …

被引用次数：40 相关文章所有 7 个版本

[PDF] openreview.net

Attributed Synthetic Data Generation for Zero-shot Image Classification

S Wang, L Song, R Shimizu, M Goto - Synthetic Data for Computer Vision … - openreview.net

Zero-shot image classification is a challenging task aiming to classify real images without
real training examples. Recent research has employed synthetic training images generated …

[PDF] thecvf.com

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

O Saha, G Van Horn, S Maji - Proceedings of the IEEE/CVF …, 2024 - openaccess.thecvf.com

The zero-shot performance of existing vision-language models (VLMs) such as CLIP is
limited by the availability of large-scale aligned image and text datasets in specific domains …

被引用次数：1 相关文章所有 3 个版本

[PDF] nsf.gov

[PDF][PDF] Can we train vision and language zero-shot classification models without syntax?

A Tejankar, M Sanjabi, B Wu, M Khabsa, S Xie… - … 2022 Workshop: Self …, 2022 - par.nsf.gov

Natural language supervision in the form of image captions was recently shown to be an
effective way of training zero-shot image classification models. In this work, we focus on …

Label Propagation for Zero-shot Classification with Vision-Language Models

Y Kalantidis, G Tolias - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Abstract Vision-Language Models (VLMs) have demonstrated impressive performance on
zero-shot classification ie classification when provided merely with a list of class names. In …

[PDF] arxiv.org

No token left behind: Explainability-aided image classification and generation

R Paiss, H Chefer, L Wolf - European Conference on Computer Vision, 2022 - Springer

The application of zero-shot learning in computer vision has been revolutionized by the use
of image-text matching models. The most notable example, CLIP, has been widely used for …

被引用次数：21 相关文章所有 5 个版本

[PDF] arxiv.org

A fistful of words: Learning transferable visual models from bag-of-words supervision

A Tejankar, M Sanjabi, B Wu, S Xie, M Khabsa… - arXiv preprint arXiv …, 2021 - arxiv.org

Using natural language as a supervision for training visual recognition models holds great
promise. Recent works have shown that if such supervision is used in the form of alignment …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Pushing boundaries: Exploring zero shot object classification with large multimodal models

A Islam, MR Biswas, W Zaghouani… - … on Social Networks …, 2023 - ieeexplore.ieee.org

The synergy of language and vision models has given rise to Large Language and Vision
Assistant models (LLVAs), designed to engage users in rich conversational experiences …

被引用次数：2 相关文章所有 3 个版本

[PDF] neurips.cc

I2dformer: Learning image to document attention for zero-shot image classification

MF Naeem, Y Xian, LV Gool… - Advances in Neural …, 2022 - proceedings.neurips.cc

Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing
methods still rely on human-annotated attributes, which are difficult to annotate and scale …

被引用次数：30 相关文章所有 7 个版本

高级搜索

QQ 群

What Do You See? Enhancing Zero-Shot Image Classification with Multimodal Large Language Models

I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification

Attributed Synthetic Data Generation for Zero-shot Image Classification

Improved Zero-Shot Classification by Adapting VLMs with Text Descriptions

[PDF][PDF] Can we train vision and language zero-shot classification models without syntax?

Label Propagation for Zero-shot Classification with Vision-Language Models

No token left behind: Explainability-aided image classification and generation

A fistful of words: Learning transferable visual models from bag-of-words supervision

Pushing boundaries: Exploring zero shot object classification with large multimodal models

I2dformer: Learning image to document attention for zero-shot image classification

相关搜索

引用