Scaling language-image pre-training via masking

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

Contrastive masked autoencoders are stronger vision learners

Z Huang, X Jin, C Lu, Q Hou, MM Cheng… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Masked image modeling (MIM) has achieved promising results on various vision tasks.
However, the limited discriminability of learned representation manifests there is still plenty …

What to hide from your students: Attention-guided masked image modeling

I Kakogeorgiou, S Gidaris, B Psomas, Y Avrithis… - … on Computer Vision, 2022 - Springer
Transformers and masked language modeling are quickly being adopted and explored in
computer vision as vision transformers and masked image modeling (MIM). In this work, we …

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

SSL4EO-S12: A large-scale multimodal, multitemporal dataset for self-supervised learning in Earth observation [Software and Data Sets]

Y Wang, NAA Braham, Z Xiong, C Liu… - … and Remote Sensing …, 2023 - ieeexplore.ieee.org
Self-supervised pretraining bears the potential to generate expressive representations from
large-scale Earth observation (EO) data without human annotation. However, most existing …

Masked scene contrast: A scalable framework for unsupervised 3d representation learning

X Wu, X Wen, X Liu, H Zhao - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
As a pioneering work, PointContrast conducts unsupervised 3D representation learning via
leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on …

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

Understanding masked autoencoders via hierarchical latent variable models

L Kong, MQ Ma, G Chen, EP Xing… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked autoencoder (MAE), a simple and effective self-supervised learning framework
based on the reconstruction of masked image regions, has recently achieved prominent …

Cae v2: Context autoencoder with clip target

X Zhang, J Chen, J Yuan, Q Chen, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked image modeling (MIM) learns visual representation by masking and reconstructing
image patches. Applying the reconstruction supervision on the CLIP representation has …

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

H Wu, C Lei, X Sun, PS Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Self-supervised representation learning follows a paradigm of withholding some part of the
data and tasking the network to predict it from the remaining part. Among many techniques …