PyTorchVideo: A deep learning library for video understanding

C Wei, H Fan, S Xie, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com

Abstract We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training
of video models. Our approach first randomly masks out a portion of the input sequence and …

被引用次数：594 相关文章所有 6 个版本

[PDF] thecvf.com

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com

In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

被引用次数：637 相关文章所有 6 个版本

[PDF] thecvf.com

Multiscale vision transformers

H Fan, B Xiong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

被引用次数：1250 相关文章所有 5 个版本

[PDF] thecvf.com

Recurring the transformer for video action recognition

J Yang, X Dong, L Liu, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com

Existing video understanding approaches, such as 3D convolutional neural networks and
Transformer-Based methods, usually process the videos in a clip-wise manner. Hence huge …

被引用次数：86 相关文章所有 4 个版本

Transformer-based deep learning model and video dataset for unsafe action identification in construction projects

M Yang, C Wu, Y Guo, R Jiang, F Zhou, J Zhang… - Automation in …, 2023 - Elsevier

A large proportion of construction accidents are caused by unintentional and unsafe actions
and behaviors. It is of significant difficulties and ineffectiveness to monitor unsafe behaviors …

被引用次数：33 相关文章

[PDF] arxiv.org

A content-driven micro-video recommendation dataset at scale

Y Ni, Y Cheng, X Liu, J Fu, Y Li, X He, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Micro-videos have recently gained immense popularity, sparking critical research in micro-
video recommendation with significant implications for the entertainment, advertising, and e …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Augly: Data augmentations for robustness

Z Papakipos, J Bitton - arXiv preprint arXiv:2201.06494, 2022 - arxiv.org

We introduce AugLy, a data augmentation library with a focus on adversarial robustness.
AugLy provides a wide array of augmentations for multiple modalities (audio, image, text, & …

被引用次数：52 相关文章所有 2 个版本

[PDF] arxiv.org

Spotting temporally precise, fine-grained events in video

J Hong, H Zhang, M Gharbi, M Fisher… - European Conference on …, 2022 - Springer

We introduce the task of spotting temporally precise, fine-grained events in video (detecting
the precise moment in time events occur). Precise spotting requires models to reason …

被引用次数：27 相关文章所有 5 个版本

[PDF] arxiv.org

WOODS: Benchmarks for out-of-distribution generalization in time series

JC Gagnon-Audet, K Ahuja, MJ Darvishi-Bayazi… - arXiv preprint arXiv …, 2022 - arxiv.org

Machine learning models often fail to generalize well under distributional shifts.
Understanding and overcoming these failures have led to a research field of Out-of …

被引用次数：31 相关文章所有 3 个版本

[PDF] thecvf.com

Action-slot: Visual action-centric representations for multi-label atomic activity recognition in traffic scenes

CH Kung, SW Lu, YH Tsai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

In this paper we study multi-label atomic activity recognition. Despite the notable progress in
action recognition it is still challenging to recognize atomic activities due to a deficiency in …

被引用次数：2 相关文章所有 4 个版本

高级搜索

QQ 群