相关文章- 学术资源搜索

Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org

Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

被引用次数：23 相关文章所有 3 个版本

[PDF] thecvf.com

Self-supervised video representation learning with meta-contrastive network

Y Lin, X Guo, Y Lu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Self-supervised learning has been successfully applied to pre-train video representations,
which aims at efficient adaptation from pre-training domain to downstream tasks. Existing …

被引用次数：45 相关文章所有 5 个版本

[PDF] ieee.org

Complete video-level representations for action recognition

M Li, R Bai, B Meng, J Ren, M Jiang, Y Yang, L Li… - Ieee …, 2021 - ieeexplore.ieee.org

In most of the existing work for activity recognition, 3D ConvNets show promising
performance for learning spatiotemporal features of videos. However, most methods sample …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Video transformer network

D Neimark, O Bar, M Zohar… - Proceedings of the …, 2021 - openaccess.thecvf.com

This paper presents VTN, a transformer-based framework for video recognition. Inspired by
recent developments in vision transformers, we ditch the standard approach in video action …

被引用次数：488 相关文章所有 9 个版本

Temporal interaction and excitation for action recognition

C Wang, L Yang, Z Zhu, P Wang… - Journal of Electronic …, 2023 - spiedigitallibrary.org

Two-stream networks have been widely used in action recognition by integrating the
appearance information from RGB frames with the motion-rich optical flow data, resulting in …

Representation learning for compressed video action recognition via attentive cross-modal interaction with motion enhancement

B Li, J Chen, D Zhang, X Bao, D Huang - arXiv preprint arXiv:2205.03569, 2022 - arxiv.org

Compressed video action recognition has recently drawn growing attention, since it
remarkably reduces the storage and computational cost via replacing raw videos by …

被引用次数：16 相关文章所有 5 个版本

[PDF] acm.org

Lightweight network architecture for real-time action recognition

A Kozlov, V Andronov, Y Gritsenko - Proceedings of the 35th Annual …, 2020 - dl.acm.org

In this work we present a new efficient approach to Human Action Recognition called Video
Transformer Network (VTN). It leverages the latest advances in Computer Vision and Natural …

被引用次数：37 相关文章所有 5 个版本

[PDF] thecvf.com

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

CY Wu, Y Li, K Mangalam, H Fan… - Proceedings of the …, 2022 - openaccess.thecvf.com

While today's video recognition systems parse snapshots or short clips accurately, they
cannot connect the dots and reason across a longer range of time yet. Most existing video …

被引用次数：173 相关文章所有 5 个版本

[PDF] thecvf.com

Quo vadis, action recognition? a new model and the kinetics dataset

J Carreira, A Zisserman - proceedings of the IEEE …, 2017 - openaccess.thecvf.com

The paucity of videos in current action classification datasets (UCF-101 and HMDB-51) has
made it difficult to identify good video architectures, as most methods obtain similar …

被引用次数：9237 相关文章所有 13 个版本

[PDF] ieee.org

Action Behavior Learning Based on a New Multi-Scale Interactive Perception Network

C Zheng, J Gu, S Xu - IEEE Access, 2023 - ieeexplore.ieee.org

Action recognition is a fundamental research topic in the field of video understanding, but
classical action recognition relies on a large amount of manually annotated video data …

高级搜索

QQ 群

Knowing what, where and when to look: Efficient video action modeling with attention

Self-supervised video representation learning with meta-contrastive network

Complete video-level representations for action recognition

Video transformer network

Temporal interaction and excitation for action recognition

Representation learning for compressed video action recognition via attentive cross-modal interaction with motion enhancement

Lightweight network architecture for real-time action recognition

Memvit: Memory-augmented multiscale vision transformer for efficient long-term video recognition

Quo vadis, action recognition? a new model and the kinetics dataset

Action Behavior Learning Based on a New Multi-Scale Interactive Perception Network

引用