visual representations mask prediction- 学术资源搜索

Self-supervised visual representations learning by contrastive mask prediction

Y Zhao, G Wang, C Luo, W Zeng… - Proceedings of the …, 2021 - openaccess.thecvf.com

… representation learning and design a mask contrast (MaskCo) framework to implement the
… gap between masked and unmasked features, we design a dedicated mask prediction head …

被引用次数：44 相关文章所有 5 个版本

[PDF] thecvf.com

Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S Xie, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com

… visual knowledge about the visible structures. In this work, we show that predicting certain
masked … A simple framework for contrastive learning of visual representations. In ICML, 2020. …

被引用次数：605 相关文章所有 6 个版本

[PDF] arxiv.org

Maskvit: Masked visual pre-training for video prediction

A Gupta, S Tian, Y Zhang, J Wu, R Martín-Martín… - arXiv preprint arXiv …, 2022 - arxiv.org

… to learn good representations for action recognition [49… , we apply masked visual modeling
for video prediction, and … line of work is leveraging good visual representations learnt via self …

被引用次数：103 相关文章所有 4 个版本

[PDF] arxiv.org

Learning audio-visual speech representation by masked multimodal cluster prediction

B Shi, WN Hsu, K Lakhotia, A Mohamed - arXiv preprint arXiv:2201.02184, 2022 - arxiv.org

… speech, which masks multi-stream video input and predicts … AV-HuBERT learns powerful
audio-visual speech representation … Using our audio-visual representation on the same …

被引用次数：243 相关文章所有 3 个版本

[PDF] neurips.cc

Mst: Masked self-supervised transformer for visual representation

Z Li, Z Chen, F Yang, W Li, Y Zhu… - Advances in …, 2021 - proceedings.neurips.cc

… are randomly masked, and the objective is to predict the original information of the masked
… In order to avoid masking the tokens of crucial region, we propose a masked token strategy …

被引用次数：136 相关文章所有 6 个版本

[PDF] mlr.press

Masked world models for visual control

Y Seo, D Hafner, H Liu, F Liu, S James… - … on Robot Learning, 2023 - proceedings.mlr.press

… prediction task for the autoencoder. Specifically, we separately update visual representations
… masking and reward prediction, and (ii) learning the latent dynamics model that predicts …

被引用次数：93 相关文章所有 6 个版本

[PDF] thecvf.com

Eva: Exploring the limits of masked visual representation learning at scale

Y Fang, W Wang, B Xie, Q Sun, L Wu… - Proceedings of the …, 2023 - openaccess.thecvf.com

… limits of large scale MIM pre-training via masked image-text aligned feature prediction [43…
masked visual representation learning. We show simple masked feature modeling as a visual …

被引用次数：473 相关文章所有 5 个版本

[PDF] arxiv.org

Learning visual representations with caption annotations

MB Sariyildiz, J Perez, D Larlus - … Conference, Glasgow, UK, August 23–28 …, 2020 - Springer

… Compared to MLM, we propose to predict masked tokens in a caption by using the visual
information computed by \(\phi \). This way, we learn visual representations that should be …

被引用次数：164 相关文章所有 4 个版本

[PDF] mlr.press

Masked trajectory models for prediction, representation, and control

P Wu, A Majumdar, K Stone, Y Lin… - International …, 2023 - proceedings.mlr.press

… of masked prediction, also known as masked autoencoding, … This task of masked prediction
not only forces the model to … the combination of masked prediction and transformer sequence …

被引用次数：32 相关文章所有 9 个版本

[PDF] thecvf.com

Simmim: A simple framework for masked image modeling

Z Xie, Z Zhang, Y Cao, Y Lin, J Bao… - Proceedings of the …, 2022 - openaccess.thecvf.com

… language modeling as a pretext task for self-supervised visual representation learning. …
representations learnt by a masked prediction task (our approach), and a joint masked prediction …

被引用次数：1149 相关文章所有 6 个版本

高级搜索

QQ 群

Self-supervised visual representations learning by contrastive mask prediction

Masked feature prediction for self-supervised visual pre-training

Maskvit: Masked visual pre-training for video prediction

Learning audio-visual speech representation by masked multimodal cluster prediction

Mst: Masked self-supervised transformer for visual representation

Masked world models for visual control

Eva: Exploring the limits of masked visual representation learning at scale

Learning visual representations with caption annotations

Masked trajectory models for prediction, representation, and control

Simmim: A simple framework for masked image modeling

相关搜索

引用