Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

Obj2seq: Formatting objects as sequences with class prompt for visual tasks

Z Chen, Y Zhu, Z Li, F Yang, W Li… - Advances in …, 2022 - proceedings.neurips.cc
Visual tasks vary a lot in their output formats and concerned contents, therefore it is hard to
process them with an identical structure. One main obstacle lies in the high-dimensional …

Rejuvenating image-gpt as strong visual representation learners

S Ren, Z Wang, H Zhu, J Xiao, A Yuille… - Forty-first International …, 2023 - openreview.net
This paper enhances image-GPT (iGPT), one of the pioneering works that introduce
autoregressive pretraining to predict the next pixels for visual representation learning. Two …

Exploring stochastic autoregressive image modeling for visual representation

Y Qi, F Yang, Y Zhu, Y Liu, L Wu, R Zhao… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
Autoregressive language modeling (ALM) has been successfully used in self-supervised pre-
training in Natural language processing (NLP). However, this paradigm has not achieved …

Self-Supervised Representation Learning from Arbitrary Scenarios

Z Li, Y Zhu, Z Chen, Z Gao, R Zhao… - Proceedings of the …, 2024 - openaccess.thecvf.com
Current self-supervised methods can primarily be categorized into contrastive learning and
masked image modeling. Extensive studies have demonstrated that combining these two …

Efficient masked autoencoders with self-consistency

Z Li, Y Zhu, Z Chen, W Li, R Zhao… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org
Inspired by the masked language modeling (MLM) in natural language processing tasks, the
masked image modeling (MIM) has been recognized as a strong self-supervised pre …

Semantic-Aware Autoregressive Image Modeling for Visual Representation Learning

K Song, S Zhang, T Wang - Proceedings of the AAAI Conference on …, 2024 - ojs.aaai.org
The development of autoregressive modeling (AM) in computer vision lags behind natural
language processing (NLP) in self-supervised pre-training. This is mainly caused by the …

Look ahead or look around? a theoretical comparison between autoregressive and masked pretraining

Q Zhang, T Du, H Huang, Y Wang, Y Wang - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, the rise of generative self-supervised learning (SSL) paradigms has
exhibited impressive performance across visual, language, and multi-modal domains. While …

Autoregressive Pretraining with Mamba in Vision

S Ren, X Li, H Tu, F Wang, F Shu, L Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
The vision community has started to build with the recently developed state space model,
Mamba, as the new backbone for a range of tasks. This paper shows that Mamba's visual …

ARVideo: Autoregressive Pretraining for Self-Supervised Video Representation Learning

S Ren, H Zhu, C Wei, Y Li, A Yuille, C Xie - arXiv preprint arXiv …, 2024 - arxiv.org
This paper presents a new self-supervised video representation learning framework,
ARVideo, which autoregressively predicts the next video token in a tailored sequence order …