A Survey on Self-supervised Learning: Algorithms, Applications, and Future Trends
Deep supervised learning algorithms typically require a large volume of labeled data to
achieve satisfactory performance. However, the process of collecting and labeling such data …
achieve satisfactory performance. However, the process of collecting and labeling such data …
Crosspoint: Self-supervised cross-modal contrastive learning for 3d point cloud understanding
M Afham, I Dissanayake… - Proceedings of the …, 2022 - openaccess.thecvf.com
Manual annotation of large-scale point cloud dataset for varying tasks such as 3D object
classification, segmentation and detection is often laborious owing to the irregular structure …
classification, segmentation and detection is often laborious owing to the irregular structure …
Self-supervised video transformer
In this paper, we propose self-supervised training for video transformers using unlabeled
video data. From a given video, we create local and global spatiotemporal views with …
video data. From a given video, we create local and global spatiotemporal views with …
Learning from temporal gradient for semi-supervised action recognition
Semi-supervised video action recognition tends to enable deep neural networks to achieve
remarkable performance even with very limited labeled data. However, existing methods are …
remarkable performance even with very limited labeled data. However, existing methods are …
Video contrastive learning with global context
Contrastive learning has revolutionized the self-supervised image representation learning
field and recently been adapted to the video domain. One of the greatest advantages of …
field and recently been adapted to the video domain. One of the greatest advantages of …
Language-based action concept spaces improve video self-supervised learning
K Ranasinghe, MS Ryoo - Advances in Neural Information …, 2023 - proceedings.neurips.cc
Recent contrastive language image pre-training has led to learning highly transferable and
robust image representations. However, adapting these models to video domain with …
robust image representations. However, adapting these models to video domain with …
Learning to refactor action and co-occurrence features for temporal action localization
The main challenge of Temporal Action Localization is to retrieve subtle human actions from
various co-occurring ingredients, eg, context and background, in an untrimmed video. While …
various co-occurring ingredients, eg, context and background, in an untrimmed video. While …
Accurate and fast compressed video captioning
Existing video captioning approaches typically require to first sample video frames from a
decoded video and then conduct a subsequent process (eg, feature extraction and/or …
decoded video and then conduct a subsequent process (eg, feature extraction and/or …
Motion-aware contrastive video representation learning via foreground-background merging
In light of the success of contrastive learning in the image domain, current self-supervised
video representation learning methods usually employ contrastive loss to facilitate video …
video representation learning methods usually employ contrastive loss to facilitate video …
Static and dynamic concepts for self-supervised video representation learning
In this paper, we propose a novel learning scheme for self-supervised video representation
learning. Motivated by how humans understand videos, we propose to first learn general …
learning. Motivated by how humans understand videos, we propose to first learn general …