Extreme masking for learning instance and distributed visual representations

Y Li, H Fan, R Hu… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract We present Fast Language-Image Pre-training (FLIP), a simple and more efficient
method for training CLIP. Our method randomly masks out and removes a large portion of …

被引用次数：218 相关文章所有 6 个版本

[PDF] arxiv.org

Contrastive masked autoencoders are stronger vision learners

Z Huang, X Jin, C Lu, Q Hou, MM Cheng… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Masked image modeling (MIM) has achieved promising results on various vision tasks.
However, the limited discriminability of learned representation manifests there is still plenty …

被引用次数：102 相关文章所有 8 个版本

[PDF] arxiv.org

What to hide from your students: Attention-guided masked image modeling

I Kakogeorgiou, S Gidaris, B Psomas, Y Avrithis… - … on Computer Vision, 2022 - Springer

Transformers and masked language modeling are quickly being adopted and explored in
computer vision as vision transformers and masked image modeling (MIM). In this work, we …

被引用次数：105 相关文章所有 14 个版本

[PDF] thecvf.com

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

被引用次数：34 相关文章所有 5 个版本

[PDF] arxiv.org

SSL4EO-S12: A large-scale multimodal, multitemporal dataset for self-supervised learning in Earth observation [Software and Data Sets]

Y Wang, NAA Braham, Z Xiong, C Liu… - … and Remote Sensing …, 2023 - ieeexplore.ieee.org

Self-supervised pretraining bears the potential to generate expressive representations from
large-scale Earth observation (EO) data without human annotation. However, most existing …

被引用次数：64 相关文章所有 4 个版本

[PDF] thecvf.com

Masked scene contrast: A scalable framework for unsupervised 3d representation learning

X Wu, X Wen, X Liu, H Zhao - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

As a pioneering work, PointContrast conducts unsupervised 3D representation learning via
leveraging contrastive learning over raw RGB-D frames and proves its effectiveness on …

被引用次数：22 相关文章所有 6 个版本

[PDF] arxiv.org

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J Xia… - arXiv preprint arXiv …, 2023 - arxiv.org

As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

被引用次数：6 相关文章所有 2 个版本

[PDF] thecvf.com

Understanding masked autoencoders via hierarchical latent variable models

L Kong, MQ Ma, G Chen, EP Xing… - Proceedings of the …, 2023 - openaccess.thecvf.com

Masked autoencoder (MAE), a simple and effective self-supervised learning framework
based on the reconstruction of masked image regions, has recently achieved prominent …

被引用次数：14 相关文章所有 11 个版本

[PDF] arxiv.org

Cae v2: Context autoencoder with clip target

X Zhang, J Chen, J Yuan, Q Chen, J Wang… - arXiv preprint arXiv …, 2022 - arxiv.org

Masked image modeling (MIM) learns visual representation by masking and reconstructing
image patches. Applying the reconstruction supervision on the CLIP representation has …

被引用次数：23 相关文章所有 2 个版本

[PDF] thecvf.com

Randomized Quantization: A Generic Augmentation for Data Agnostic Self-supervised Learning

H Wu, C Lei, X Sun, PS Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com

Self-supervised representation learning follows a paradigm of withholding some part of the
data and tasking the network to predict it from the remaining part. Among many techniques …

被引用次数：2 相关文章所有 6 个版本

高级搜索

QQ 群