Unified contrastive learning in image-text-label space

J Yang, C Li, P Zhang, B Xiao, C Liu… - Proceedings of the …, 2022 - openaccess.thecvf.com
Visual recognition is recently learned via either supervised learning on human-annotated
image-label data or language-image contrastive learning with webly-crawled image-text …

Data efficient language-supervised zero-shot recognition with optimal transport distillation

B Wu, R Cheng, P Zhang, T Gao, P Vajda… - arXiv preprint arXiv …, 2021 - arxiv.org
Traditional computer vision models are trained to predict a fixed set of predefined
categories. Recently, natural language has been shown to be a broader and richer source of …

Ra-clip: Retrieval augmented contrastive language-image pre-training

CW Xie, S Sun, X Xiong, Y Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) is attracting increasing attention
for its impressive zero-shot recognition performance on different down-stream tasks …

I2mvformer: Large language model generated multi-view document supervision for zero-shot image classification

MF Naeem, MGZA Khan, Y Xian… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recent works have shown that unstructured text (documents) from online sources can serve
as useful auxiliary information for zero-shot image classification. However, these methods …

I2dformer: Learning image to document attention for zero-shot image classification

MF Naeem, Y Xian, LV Gool… - Advances in Neural …, 2022 - proceedings.neurips.cc
Despite the tremendous progress in zero-shot learning (ZSL), the majority of existing
methods still rely on human-annotated attributes, which are difficult to annotate and scale …

Sus-x: Training-free name-only transfer of vision-language models

V Udandarao, A Gupta… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Abstract Contrastive Language-Image Pre-training (CLIP) has emerged as a simple yet
effective way to train large-scale vision-language models. CLIP demonstrates impressive …

Domain-aware visual bias eliminating for generalized zero-shot learning

S Min, H Yao, H Xie, C Wang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Generalized zero-shot learning aims to recognize images from seen and unseen domains.
Recent methods focus on learning a unified semantic-aligned visual representation to …

Non-contrastive learning meets language-image pre-training

J Zhou, L Dong, Z Gan, L Wang… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Contrastive language-image pre-training (CLIP) serves as a de-facto standard to align
images and texts. Nonetheless, the loose correlation between images and texts of web …

Data-efficient language-supervised zero-shot learning with self-distillation

R Cheng, B Wu, P Zhang, P Vajda… - Proceedings of the …, 2021 - openaccess.thecvf.com
Traditional computer vision models are trained to predict a fixed set of predefined
categories. Recently, natural language has been shown to be a broader and richer source of …

Progressive ensemble networks for zero-shot recognition

M Ye, Y Guo - Proceedings of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com
Despite the advancement of supervised image recognition algorithms, their dependence on
the availability of labeled data and the rapid expansion of image categories raise the …