Learning vision from models rivals learning vision from data

Y Tian, L Fan, K Chen, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …

Virtex: Learning visual representations from textual annotations

K Desai, J Johnson - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com
The de-facto approach to many vision tasks is to start from pretrained visual representations,
typically learned via supervised training on ImageNet. Recent methods have explored …

Learning visual representations with caption annotations

MB Sariyildiz, J Perez, D Larlus - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
Pretraining general-purpose visual features has become a crucial part of tackling many
computer vision tasks. While one can learn such features on the extensively-annotated …

Scaling up visual and vision-language representation learning with noisy text supervision

C Jia, Y Yang, Y Xia, YT Chen… - International …, 2021 - proceedings.mlr.press
Pre-trained representations are becoming crucial for many NLP and perception tasks. While
representation learning in NLP has transitioned to training on raw text without human …

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

S Gu, C Clark, A Kembhavi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
Many high-level skills that are required for computer vision tasks, such as parsing questions,
comparing and contrasting semantics, and writing descriptions, are also required in other …

Learning representations by predicting bags of visual words

S Gidaris, A Bursuc, N Komodakis… - Proceedings of the …, 2020 - openaccess.thecvf.com
Self-supervised representation learning targets to learn convnet-based image
representations from unlabeled data. Inspired by the success of NLP methods in this area, in …

Self-supervised visual representations learning by contrastive mask prediction

Y Zhao, G Wang, C Luo, W Zeng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Advanced self-supervised visual representation learning methods rely on the instance
discrimination (ID) pretext task. We point out that the ID task has an implicit semantic …

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc
We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press
State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …