Masked feature prediction for self-supervised visual pre-training

C Wei, H Fan, S Xie, CY Wu, A Yuille… - Proceedings of the …, 2022 - openaccess.thecvf.com
Abstract We present Masked Feature Prediction (MaskFeat) for self-supervised pre-training
of video models. Our approach first randomly masks out a portion of the input sequence and …

Mvitv2: Improved multiscale vision transformers for classification and detection

Y Li, CY Wu, H Fan, K Mangalam… - Proceedings of the …, 2022 - openaccess.thecvf.com
In this paper, we study Multiscale Vision Transformers (MViTv2) as a unified architecture for
image and video classification, as well as object detection. We present an improved version …

Multiscale vision transformers

H Fan, B Xiong, K Mangalam, Y Li… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Multiscale Vision Transformers (MViT) for video and image recognition,
by connecting the seminal idea of multiscale feature hierarchies with transformer models …

Recurring the transformer for video action recognition

J Yang, X Dong, L Liu, C Zhang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Existing video understanding approaches, such as 3D convolutional neural networks and
Transformer-Based methods, usually process the videos in a clip-wise manner. Hence huge …

Transformer-based deep learning model and video dataset for unsafe action identification in construction projects

M Yang, C Wu, Y Guo, R Jiang, F Zhou, J Zhang… - Automation in …, 2023 - Elsevier
A large proportion of construction accidents are caused by unintentional and unsafe actions
and behaviors. It is of significant difficulties and ineffectiveness to monitor unsafe behaviors …

A content-driven micro-video recommendation dataset at scale

Y Ni, Y Cheng, X Liu, J Fu, Y Li, X He, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Micro-videos have recently gained immense popularity, sparking critical research in micro-
video recommendation with significant implications for the entertainment, advertising, and e …

Augly: Data augmentations for robustness

Z Papakipos, J Bitton - arXiv preprint arXiv:2201.06494, 2022 - arxiv.org
We introduce AugLy, a data augmentation library with a focus on adversarial robustness.
AugLy provides a wide array of augmentations for multiple modalities (audio, image, text, & …

Spotting temporally precise, fine-grained events in video

J Hong, H Zhang, M Gharbi, M Fisher… - European Conference on …, 2022 - Springer
We introduce the task of spotting temporally precise, fine-grained events in video (detecting
the precise moment in time events occur). Precise spotting requires models to reason …

WOODS: Benchmarks for out-of-distribution generalization in time series

JC Gagnon-Audet, K Ahuja, MJ Darvishi-Bayazi… - arXiv preprint arXiv …, 2022 - arxiv.org
Machine learning models often fail to generalize well under distributional shifts.
Understanding and overcoming these failures have led to a research field of Out-of …

Action-slot: Visual action-centric representations for multi-label atomic activity recognition in traffic scenes

CH Kung, SW Lu, YH Tsai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
In this paper we study multi-label atomic activity recognition. Despite the notable progress in
action recognition it is still challenging to recognize atomic activities due to a deficiency in …