Videomae: Masked autoencoders are data-efficient learners for self-supervised video pre-training

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com
Benefiting from masked visual modeling, self-supervised video representation learning has
achieved remarkable progress. However, existing methods focus on learning …

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

Self-supervised object detection from egocentric videos

P Akiva, J Huang, KJ Liang, R Kovvuri… - Proceedings of the …, 2023 - openaccess.thecvf.com
Understanding the visual world from the perspective of humans (egocentric) has been a
long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity …

Universal time-series representation learning: A survey

P Trirat, Y Shin, J Kang, Y Nam, J Na, M Bae… - arXiv preprint arXiv …, 2024 - arxiv.org
Time-series data exists in every corner of real-world systems and services, ranging from
satellites in the sky to wearable devices on human bodies. Learning representations by …

Continuous frame motion sensitive self-supervised collaborative network for video representation learning

S Bi, Z Hu, M Zhao, H Zhang, J Di, Z Sun - Advanced Engineering …, 2023 - Elsevier
Motion, as a feature of video that changes in temporal sequences, is crucial to visual
understanding. The powerful video representation and extraction models are typically able …

Self-supervised video representation learning by serial restoration with elastic complexity

Z Chen, H Wang, CW Chen - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org
Self-supervised video representation learning leaves out heavy manual annotation by
automatically excavating supervisory signals. Although contrastive learning based …

MOFO: MOtion FOcused Self-Supervision for Video Understanding

M Ahmadian, F Guerin, A Gilbert - arXiv preprint arXiv:2308.12447, 2023 - arxiv.org
Self-supervised learning (SSL) techniques have recently produced outstanding results in
learning visual representations from unlabeled videos. Despite the importance of motion in …

Fine-grained spatiotemporal motion alignment for contrastive video representation learning

M Zhu, X Lin, R Dang, C Liu, Q Chen - Proceedings of the 31st ACM …, 2023 - dl.acm.org
As the most essential property in a video, motion information is critical to a robust and
generalized video representation. To inject motion dynamics, recent works have adopted …

Motion-guided spatiotemporal multitask feature discrimination for self-supervised video representation learning

S Bi, Z Hu, H Zhang, J Di, Z Sun - Pattern Recognition, 2024 - Elsevier
Powerful self-supervised representation models are able to step out of the traditional
supervised paradigm and rely merely on unlabeled data to achieve a deep understanding of …