A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends

J Gui, T Chen, J Zhang, Q Cao, Z Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …

Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding

M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …

Self-supervised video transformer

K Ranasinghe, M Naseer, S Khan… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …

Learning from temporal gradient for semi-supervised action recognition

J Xiao, L Jing, L Zhang, J He, Q She… - Proceedings of the …, 2022 - openaccess.thecvf.com
Semi-supervised video action recognition tends to enable deep neural networks to achieve
remarkable performance even with very limited labeled data. However, existing methods are …

Video contrastive learning with global context

H Kuang, Y Zhu, Z Zhang, X Li… - Proceedings of the …, 2021 - openaccess.thecvf.com
Contrastive learning has revolutionized the self-supervised image representation learning
field and recently been adapted to the video domain. One of the greatest advantages of …

Language-based action concept spaces improve video self-supervised learning

K Ranasinghe, MS Ryoo - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Recent contrastive language image pre-training has led to learning highly transferable and
robust image representations. However, adapting these models to video domain with …

Learning to refactor action and co-occurrence features for temporal action localization

K Xia, L Wang, S Zhou, N Zheng… - Proceedings of the …, 2022 - openaccess.thecvf.com
The main challenge of Temporal Action Localization is to retrieve subtle human actions from
various co-occurring ingredients, eg, context and background, in an untrimmed video. While …

Accurate and fast compressed video captioning

Y Shen, X Gu, K Xu, H Fan, L Wen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Existing video captioning approaches typically require to first sample video frames from a
decoded video and then conduct a subsequent process (eg, feature extraction and/or …

Motion-aware contrastive video representation learning via foreground-background merging

S Ding, M Li, T Yang, R Qian, H Xu… - Proceedings of the …, 2022 - openaccess.thecvf.com
In light of the success of contrastive learning in the image domain, current self-supervised
video representation learning methods usually employ contrastive loss to facilitate video …

Static and dynamic concepts for self-supervised video representation learning

R Qian, S Ding, X Liu, D Lin - European Conference on Computer Vision, 2022 - Springer
In this paper, we propose a novel learning scheme for self-supervised video representation
learning. Motivated by how humans understand videos, we propose to first learn general …