Self-supervised visual representations learning by contrastive mask prediction

Y Zhao, G Wang, C Luo, W Zeng… - Proceedings of the …, 2021 - openaccess.thecvf.com
… representation learning and design a mask contrast (MaskCo) framework to implement the
… gap between masked and unmasked features, we design a dedicated mask prediction head …

Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S Xie, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com
visual knowledge about the visible structures. In this work, we show that predicting certain
masked … A simple framework for contrastive learning of visual representations. In ICML, 2020. …

Maskvit: Masked visual pre-training for video prediction

A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín… - arXiv preprint arXiv …, 2022 - arxiv.org
… to learn good representations for action recognition [49… , we apply masked visual modeling
for video prediction, and … line of work is leveraging good visual representations learnt via self …

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arXiv preprint arXiv:2201.02184, 2022 - arxiv.org
… speech, which masks multi-stream video input and predicts … AV-HuBERT learns powerful
audio-visual speech representation … Using our audio-visual representation on the same …

Mst: Masked self-supervised transformer for visual representation

Z Li, Z Chen, F Yang, W Li, Y Zhu… - Advances in …, 2021 - proceedings.neurips.cc
… are randomly masked, and the objective is to predict the original information of the masked
… In order to avoid masking the tokens of crucial region, we propose a masked token strategy …

Masked world models for visual control

Y Seo, D Hafner, H Liu, F Liu, S James… - … on Robot Learning, 2023 - proceedings.mlr.press
prediction task for the autoencoder. Specifically, we separately update visual representations
masking and reward prediction, and (ii) learning the latent dynamics model that predicts …

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com
… limits of large scale MIM pre-training via masked image-text aligned feature prediction [43…
masked visual representation learning. We show simple masked feature modeling as a visual

Learning visual representations with caption annotations

MB Sariyildiz, J Perez, D Larlus - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer
… Compared to MLM, we propose to predict masked tokens in a caption by using the visual
information computed by \(\phi \). This way, we learn visual representations that should be …

Masked trajectory models for prediction, representation, and control

P Wu, A Majumdar, K Stone, Y Lin… - International …, 2023 - proceedings.mlr.press
… of masked prediction, also known as masked autoencoding, … This task of masked prediction
not only forces the model to … the combination of masked prediction and transformer sequence …

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com
… language modeling as a pretext task for self-supervised visual representation learning. …
representations learnt by a masked prediction task (our approach), and a joint masked prediction