A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Internimage: Exploring large-scale vision foundation models with deformable convolutions

W Wang, J Dai, Z Chen, Z Huang, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Compared to the great progress of large-scale vision transformers (ViTs) in recent years,
large-scale models based on convolutional neural networks (CNNs) are still in an early …

Learning 3d representations from 2d pre-trained models via image-to-point masked autoencoders

R Zhang, L Wang, Y Qiao, P Gao… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Pre-training by numerous image data has become de-facto for robust 2D representations. In
contrast, due to the expensive data processing, a paucity of 3D datasets severely hinders …

Group detr: Fast detr training with group-wise one-to-many assignment

Q Chen, X Chen, J Wang, S Zhang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Detection transformer (DETR) relies on one-to-one assignment, assigning one ground-truth
object to one prediction, for end-to-end detection without NMS post-processing. It is known …

Hard patches mining for masked image modeling

H Wang, K Song, J Fan, Y Wang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked image modeling (MIM) has attracted much research attention due to its promising
potential for learning scalable visual representations. In typical approaches, models usually …

Mixed autoencoder for self-supervised visual representation learning

K Chen, Z Liu, L Hong, H Xu, Z Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Masked Autoencoder (MAE) has demonstrated superior performance on various vision tasks
via randomly masking image patches and reconstruction. However, effective data …

Improving pixel-based mim by reducing wasted modeling capability

Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …

A survey on masked autoencoder for self-supervised learning in vision and beyond

C Zhang, C Zhang, J Song, JSK Yi, K Zhang… - arXiv preprint arXiv …, 2022 - arxiv.org
Masked autoencoders are scalable vision learners, as the title of MAE\cite {he2022masked},
which suggests that self-supervised learning (SSL) in vision might undertake a similar …

Masked modeling for self-supervised representation learning on vision and beyond

S Li, L Zhang, Z Wang, D Wu, L Wu, Z Liu, J Xia… - arXiv preprint arXiv …, 2023 - arxiv.org
As the deep learning revolution marches on, self-supervised learning has garnered
increasing attention in recent years thanks to its remarkable representation learning ability …

Mixreorg: Cross-modal mixed patch reorganization is a good mask learner for open-world semantic segmentation

K Cai, P Ren, Y Zhu, H Xu, J Liu, C Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Recently, semantic segmentation models trained with image-level text supervision have
shown promising results in challenging open-world scenarios. However, these models still …