A Xiao, J Huang, D Guan, X Zhang… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Point cloud data have been widely explored due to its superior accuracy and robustness under various adverse situations. Meanwhile, deep neural networks (DNNs) have achieved …
Scale is the primary factor for building a powerful foundation model that could well generalize to a variety of downstream tasks. However, it is still challenging to train video …
Z Tong, Y Song, J Wang… - Advances in neural …, 2022 - proceedings.neurips.cc
Pre-training video transformers on extra large-scale datasets is generally required to achieve premier performance on relatively small datasets. In this paper, we show that video …
Benefiting from masked visual modeling, self-supervised video representation learning has achieved remarkable progress. However, existing methods focus on learning …
Y Liu, K Wang, L Liu, H Lan, L Lin - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Video self-supervised learning is a challenging task, which requires significant expressive power from the model to leverage rich spatial-temporal knowledge and generate effective …
In this paper, we propose self-supervised training for video transformers using unlabeled video data. From a given video, we create local and global spatiotemporal views with …
Y Liu, S Zhang, J Chen, Z Yu… - Proceedings of the …, 2023 - openaccess.thecvf.com
There has been significant progress in Masked Image Modeling (MIM). Existing MIM methods can be broadly categorized into two groups based on the reconstruction target …
TD Truong, QH Bui, CN Duong… - Proceedings of the …, 2022 - openaccess.thecvf.com
Human action recognition has recently become one ofthe popular research topics in the computer vision community. Various 3D-CNN based methods have been presented to tackle …
Abstract This paper presents Probabilistic Video Contrastive Learning, a self-supervised representation learning method that bridges contrastive learning with probabilistic …