Masked autoencoders are scalable vision learners

K He, X Chen, S Xie, Y Li, P Dollár… - Proceedings of the …, 2022 - openaccess.thecvf.com
This paper shows that masked autoencoders (MAE) are scalable self-supervised learners
for computer vision. Our MAE approach is simple: we mask random patches of the input …

Adversarial masking for self-supervised learning

Y Shi, N Siddharth, P Torr… - … Conference on Machine …, 2022 - proceedings.mlr.press
We propose ADIOS, a masked image model (MIM) framework for self-supervised learning,
which simultaneously learns a masking function and an image encoder using an adversarial …

Corrupted image modeling for self-supervised visual pre-training

Y Fang, L Dong, H Bao, X Wang, F Wei - arXiv preprint arXiv:2202.03382, 2022 - arxiv.org
We introduce Corrupted Image Modeling (CIM) for self-supervised visual pre-training. CIM
uses an auxiliary generator with a small trainable BEiT to corrupt the input image instead of …

Are large-scale datasets necessary for self-supervised pre-training?

A El-Nouby, G Izacard, H Touvron, I Laptev… - arXiv preprint arXiv …, 2021 - arxiv.org
Pre-training models on large scale datasets, like ImageNet, is a standard practice in
computer vision. This paradigm is especially effective for tasks with small training sets, for …

Contrastive masked autoencoders are stronger vision learners

Z Huang, X Jin, C Lu, Q Hou, MM Cheng, D Fu… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked image modeling (MIM) has achieved promising results on various vision tasks.
However, the limited discriminability of learned representation manifests there is still plenty …

Uniform masking: Enabling mae pre-training for pyramid-based vision transformers with locality

X Li, W Wang, L Yang, J Yang - arXiv preprint arXiv:2205.10063, 2022 - arxiv.org
Masked AutoEncoder (MAE) has recently led the trends of visual self-supervision area by an
elegant asymmetric encoder-decoder design, which significantly optimizes both the pre …

Real-world robot learning with masked visual pre-training

I Radosavovic, T Xiao, S James… - … on Robot Learning, 2023 - proceedings.mlr.press
In this work, we explore self-supervised visual pre-training on images from diverse, in-the-
wild videos for real-world robotic tasks. Like prior work, our visual representations are pre …

Dense contrastive learning for self-supervised visual pre-training

X Wang, R Zhang, C Shen… - Proceedings of the …, 2021 - openaccess.thecvf.com
To date, most existing self-supervised learning methods are designed and optimized for
image classification. These pre-trained models can be sub-optimal for dense prediction …

Seed: Self-supervised distillation for visual representation

Z Fang, J Wang, L Wang, L Zhang, Y Yang… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper is concerned with self-supervised learning for small models. The problem is
motivated by our empirical studies that while the widely used contrastive self-supervised …

Casting your model: Learning to localize improves self-supervised representations

RR Selvaraju, K Desai, J Johnson… - Proceedings of the …, 2021 - openaccess.thecvf.com
Recent advances in self-supervised learning (SSL) have largely closed the gap with
supervised ImageNet pretraining. Despite their success these methods have been primarily …