相关文章- 学术资源搜索

Learning vision from models rivals learning vision from data

Y Tian, L Fan, K Chen, D Katabi… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce SynCLR a novel approach for learning visual representations exclusively from
synthetic images without any real data. We synthesize a large dataset of image captions …

被引用次数：17 相关文章所有 3 个版本

[PDF] thecvf.com

Virtex: Learning visual representations from textual annotations

K Desai, J Johnson - … of the IEEE/CVF conference on …, 2021 - openaccess.thecvf.com

The de-facto approach to many vision tasks is to start from pretrained visual representations,
typically learned via supervised training on ImageNet. Recent methods have explored …

被引用次数：419 相关文章所有 8 个版本

[PDF] arxiv.org

Learning visual representations with caption annotations

MB Sariyildiz, J Perez, D Larlus - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer

Pretraining general-purpose visual features has become a crucial part of tackling many
computer vision tasks. While one can learn such features on the extensively-annotated …

被引用次数：160 相关文章所有 4 个版本

[PDF] mlr.press

Scaling up visual and vision-language representation learning with noisy text supervision

C Jia, Y Yang, Y Xia, YT Chen… - International …, 2021 - proceedings.mlr.press

Pre-trained representations are becoming crucial for many NLP and perception tasks. While
representation learning in NLP has transitioned to training on raw text without human …

被引用次数：2835 相关文章所有 6 个版本

[PDF] thecvf.com

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

S Gu, C Clark, A Kembhavi - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

Many high-level skills that are required for computer vision tasks, such as parsing questions,
comparing and contrasting semantics, and writing descriptions, are also required in other …

被引用次数：10 相关文章所有 3 个版本

[PDF] thecvf.com

Learning representations by predicting bags of visual words

S Gidaris, A Bursuc, N Komodakis… - Proceedings of the …, 2020 - openaccess.thecvf.com

Self-supervised representation learning targets to learn convnet-based image
representations from unlabeled data. Inspired by the success of NLP methods in this area, in …

被引用次数：110 相关文章所有 11 个版本

[PDF] thecvf.com

Self-supervised visual representations learning by contrastive mask prediction

Y Zhao, G Wang, C Luo, W Zeng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Advanced self-supervised visual representation learning methods rely on the instance
discrimination (ID) pretext task. We point out that the ID task has an implicit semantic …

被引用次数：43 相关文章所有 5 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

We launch EVA, a vision-centric foundation model to explore the limits of visual
representation at scale using only publicly accessible data. EVA is a vanilla ViT pre-trained …

被引用次数：422 相关文章所有 5 个版本

[PDF] neurips.cc

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Y Tian, L Fan, P Isola, H Chang… - Advances in Neural …, 2024 - proceedings.neurips.cc

We investigate the potential of learning visual representations using synthetic images
generated by text-to-image models. This is a natural question in the light of the excellent …

被引用次数：65 相关文章所有 5 个版本

[PDF] mlr.press

Learning transferable visual models from natural language supervision

A Radford, JW Kim, C Hallacy… - International …, 2021 - proceedings.mlr.press

State-of-the-art computer vision systems are trained to predict a fixed set of predetermined
object categories. This restricted form of supervision limits their generality and usability since …

被引用次数：18382 相关文章所有 20 个版本

高级搜索

QQ 群

Learning vision from models rivals learning vision from data

Virtex: Learning visual representations from textual annotations

Learning visual representations with caption annotations

Scaling up visual and vision-language representation learning with noisy text supervision

I Can't Believe There's No Images! Learning Visual Tasks Using only Language Supervision

Learning representations by predicting bags of visual words

Self-supervised visual representations learning by contrastive mask prediction

Eva: Exploring the limits of masked visual representation learning at scale

Stablerep: Synthetic images from text-to-image models make strong visual representation learners

Learning transferable visual models from natural language supervision

引用