A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Unsupervised point cloud representation learning with deep neural networks: A survey

A Xiao, J Huang, D Guan, X Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Point cloud data have been widely explored due to its superior accuracy and robustness
under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …

Videomae v2: Scaling video masked autoencoders with dual masking

L Wang, B Huang, Z Zhao, Z Tong… - Proceedings of the …, 2023 - openaccess.thecvf.com
Scale is the primary factor for building a powerful foundation model that could well
generalize to a variety of downstream tasks. However, it is still challenging to train video …

Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Benefiting from masked visual modeling, self-supervised video representation learning has
achieved remarkable progress. However, existing methods focus on learning …

Tcgl: Temporal contrastive graph for self-supervised video representation learning

Y Liu, K Wang, L Liu, H Lan, L Lin - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video self-supervised learning is a challenging task, which requires significant expressive
power from the model to leverage rich spatial-temporal knowledge and generate effective …

Self-supervised video transformer

K Ranasinghe, M Naseer, S Khan… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …

Improving pixel-based mim by reducing wasted modeling capability

Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
There has been significant progress in Masked Image Modeling (MIM). Existing MIM
methods can be broadly categorized into two groups based on the reconstruction target …

Direcformer: A directed attention in transformer approach to robust action recognition

TD Truong, QH Bui, CN Duong… - Proceedings of the …, 2022 - openaccess.thecvf.com
Human action recognition has recently become one ofthe popular research topics in the
computer vision community. Various 3D-CNN based methods have been presented to tackle …

Probabilistic representations for video contrastive learning

J Park, J Lee, IJ Kim, K Sohn - Proceedings of the IEEE/CVF …, 2022 - openaccess.thecvf.com
Abstract This paper presents Probabilistic Video Contrastive Learning, a self-supervised
representation learning method that bridges contrastive learning with probabilistic …