相关文章- 学术资源搜索

Masked autoencoders are scalable vision learners

K He, X Chen, S Xie, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com

This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

被引用次数：1797 相关文章所有 10 个版本

[PDF] mlr.press

Adversarial masking for self-supervised learning

Y Shi, N Siddharth, P Torr… - … Conference on Machine …, 2022 - proceedings.mlr.press

We propose ADIOS, a masked image model (MIM) framework for self-supervised learning,
which simultaneously learns a masking function and an image encoder using an adversarial …

被引用次数：28 相关文章所有 3 个版本

[PDF] arxiv.org

Corrupted image modeling for self-supervised visual pre-training

Y Fang, L Dong, H Bao, X Wang, F Wei - arXiv preprint arXiv:2202.03382, 2022 - arxiv.org

We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. CIM
uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of …

被引用次数：44 相关文章所有 3 个版本

[PDF] arxiv.org

Are large-scale datasets necessary for self-supervised pre-training?

A El-Nouby, G Izacard, H Touvron, I Laptev… - arXiv preprint arXiv …, 2021 - arxiv.org

Pre-training models on large scale datasets, like ImageNet, is a standard practice in
computer vision. This paradigm is especially effective for tasks with small training sets, for …

被引用次数：77 相关文章所有 4 个版本

[PDF] arxiv.org

Contrastive masked autoencoders are stronger vision learners

Z Huang, X Jin, C Lu, Q Hou, MM Cheng, D Fu… - arXiv preprint arXiv …, 2022 - arxiv.org

Masked image modeling (MIM) has achieved promising results on various vision tasks.
However, the limited discriminability of learned representation manifests there is still plenty …

被引用次数：27 相关文章所有 3 个版本

[PDF] arxiv.org

Uniform masking: Enabling mae pre-training for pyramid-based vision transformers with locality

X Li, W Wang, L Yang, J Yang - arXiv preprint arXiv:2205.10063, 2022 - arxiv.org

Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an
elegant asymmetric encoder-decoder design, which significantly optimizes both the pre …

被引用次数：20 相关文章所有 2 个版本

[PDF] mlr.press

Real-world robot learning with masked visual pre-training

I Radosavovic, T Xiao, S James… - … on Robot Learning, 2023 - proceedings.mlr.press

In this work, we explore self-supervised visual pre-training on images from diverse, in-the-
wild videos for real-world robotic tasks. Like prior work, our visual representations are pre …

被引用次数：29 相关文章所有 3 个版本

[PDF] thecvf.com

Dense contrastive learning for self-supervised visual pre-training

X Wang, R Zhang, C Shen… - Proceedings of the …, 2021 - openaccess.thecvf.com

To date, most existing self-supervised learning methods are designed and optimized for
image classification. These pre-trained models can be sub-optimal for dense prediction …

被引用次数：397 相关文章所有 6 个版本

[PDF] arxiv.org

Seed: Self-supervised distillation for visual representation

Z Fang, J Wang, L Wang, L Zhang, Y Yang… - arXiv preprint arXiv …, 2021 - arxiv.org

This paper is concerned with self-supervised learning for small models. The problem is
motivated by our empirical studies that while the widely used contrastive self-supervised …

被引用次数：108 相关文章所有 4 个版本

[PDF] thecvf.com

Casting your model: Learning to localize improves self-supervised representations

RR Selvaraju, K Desai, J Johnson… - Proceedings of the …, 2021 - openaccess.thecvf.com

Recent advances in self-supervised learning (SSL) have largely closed the gap with
supervised ImageNet pretraining. Despite their success these methods have been primarily …

被引用次数：53 相关文章所有 7 个版本

高级搜索

QQ 群

Masked autoencoders are scalable vision learners

Adversarial masking for self-supervised learning

Corrupted image modeling for self-supervised visual pre-training

Are large-scale datasets necessary for self-supervised pre-training?

Contrastive masked autoencoders are stronger vision learners

Uniform masking: Enabling mae pre-training for pyramid-based vision transformers with locality

Real-world robot learning with masked visual pre-training

Dense contrastive learning for self-supervised visual pre-training

Seed: Self-supervised distillation for visual representation

Casting your model: Learning to localize improves self-supervised representations

相关搜索

引用