Cross-architecture self-supervised video representation learning

Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc

Pre-training video transformers on extra large-scale datasets is generally required to
achieve premier performance on relatively small datasets. In this paper, we show that video …

被引用次数：930 相关文章所有 6 个版本

[PDF] thecvf.com

Masked video distillation: Rethinking masked feature modeling for self-supervised video representation learning

R Wang, D Chen, Z Wu, Y Chen… - Proceedings of the …, 2023 - openaccess.thecvf.com

Benefiting from masked visual modeling, self-supervised video representation learning has
achieved remarkable progress. However, existing methods focus on learning …

被引用次数：76 相关文章所有 7 个版本

[PDF] arxiv.org

Video transformers: A survey

J Selva, AS Johansen, S Escalera… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

Transformer models have shown great success handling long-range interactions, making
them a promising tool for modeling video. However, they lack inductive biases and scale …

被引用次数：110 相关文章所有 8 个版本

[PDF] thecvf.com

Self-supervised object detection from egocentric videos

P Akiva, J Huang, KJ Liang, R Kovvuri… - Proceedings of the …, 2023 - openaccess.thecvf.com

Understanding the visual world from the perspective of humans (egocentric) has been a
long-standing challenge in computer vision. Egocentric videos exhibit high scene complexity …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Universal time-series representation learning: A survey

P Trirat, Y Shin, J Kang, Y Nam, J Na, M Bae… - arXiv preprint arXiv …, 2024 - arxiv.org

Time-series data exists in every corner of real-world systems and services, ranging from
satellites in the sky to wearable devices on human bodies. Learning representations by …

被引用次数：4 相关文章所有 2 个版本

Continuous frame motion sensitive self-supervised collaborative network for video representation learning

S Bi, Z Hu, M Zhao, H Zhang, J Di, Z Sun - Advanced Engineering …, 2023 - Elsevier

Motion, as a feature of video that changes in temporal sequences, is crucial to visual
understanding. The powerful video representation and extraction models are typically able …

被引用次数：6 相关文章所有 2 个版本

Self-supervised video representation learning by serial restoration with elastic complexity

Z Chen, H Wang, CW Chen - IEEE Transactions on Multimedia, 2023 - ieeexplore.ieee.org

Self-supervised video representation learning leaves out heavy manual annotation by
automatically excavating supervisory signals. Although contrastive learning based …

被引用次数：4 相关文章所有 2 个版本

[PDF] arxiv.org

MOFO: MOtion FOcused Self-Supervision for Video Understanding

M Ahmadian, F Guerin, A Gilbert - arXiv preprint arXiv:2308.12447, 2023 - arxiv.org

Self-supervised learning (SSL) techniques have recently produced outstanding results in
learning visual representations from unlabeled videos. Despite the importance of motion in …

被引用次数：2 相关文章所有 4 个版本

[PDF] arxiv.org

Fine-grained spatiotemporal motion alignment for contrastive video representation learning

M Zhu, X Lin, R Dang, C Liu, Q Chen - Proceedings of the 31st ACM …, 2023 - dl.acm.org

As the most essential property in a video, motion information is critical to a robust and
generalized video representation. To inject motion dynamics, recent works have adopted …

被引用次数：3 相关文章所有 3 个版本

Motion-guided spatiotemporal multitask feature discrimination for self-supervised video representation learning

S Bi, Z Hu, H Zhang, J Di, Z Sun - Pattern Recognition, 2024 - Elsevier

Powerful self-supervised representation models are able to step out of the traditional
supervised paradigm and rely merely on unlabeled data to achieve a deep understanding of …

高级搜索

QQ 群