Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org
Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

Self-supervised video representation learning with meta-contrastive network

Y Lin, X Guo, Y Lu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Self-supervised learning has been successfully applied to pre-train video representations,
which aims at efficient adaptation from pre-training domain to downstream tasks. Existing …

Complete video-level representations for action recognition

M Li, R Bai, B Meng, J Ren, M Jiang, Y Yang, L Li… - Ieee …, 2021 - ieeexplore.ieee.org
In most of the existing work for activity recognition, 3D ConvNets show promising
performance for learning spatiotemporal features of videos. However, most methods sample …

Video transformer network

D Neimark, O Bar, M Zohar… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents VTN, a transformer-based framework for video recognition. Inspired by
recent developments in vision transformers, we ditch the standard approach in video action …

Temporal interaction and excitation for action recognition

C Wang, L Yang, Z Zhu, P Wang… - Journal of Electronic …, 2023 - spiedigitallibrary.org
Two-stream networks have been widely used in action recognition by integrating the
appearance information from RGB frames with the motion-rich optical flow data, resulting in …

Representation learning for compressed video action recognition via attentive cross-modal interaction with motion enhancement

B Li, J Chen, D Zhang, X Bao, D Huang - arXiv preprint arXiv:2205.03569, 2022 - arxiv.org
Compressed video action recognition has recently drawn growing attention, since it
remarkably reduces the storage and computational cost via replacing raw videos by …

Lightweight network architecture for real-time action recognition

A Kozlov, V Andronov, Y Gritsenko - Proceedings of the 35th Annual …, 2020 - dl.acm.org
In this work we present a new efficient approach to Human Action Recognition called Video
Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural …

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com
While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

Quo vadis, action recognition? a new model and the kinetics dataset

J Carreira, A Zisserman - proceedings of the IEEE …, 2017 - openaccess.thecvf.com
The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has
made it difficult to identify good video architectures, as most methods obtain similar …

Action Behavior Learning Based on a New Multi-Scale Interactive Perception Network

C Zheng, J Gu, S Xu - IEEE Access, 2023 - ieeexplore.ieee.org
Action recognition is a fundamental research topic in the field of video understanding, but
classical action recognition relies on a large amount of manually annotated video data …