Nuta: Non-uniform temporal aggregation for action recognition

X Li, C Liu, B Shuai, Y Zhu, H Chen… - Proceedings of the …, 2022 - openaccess.thecvf.com
In the world of action recognition research, one primary focus has been on how to construct
and train networks to model the spatial-temporal volume of an input video. These methods …

Tea: Temporal excitation and aggregation for action recognition

Y Li, B Ji, X Shi, J Zhang, B Kang… - Proceedings of the …, 2020 - openaccess.thecvf.com
Temporal modeling is key for action recognition in videos. It normally considers both short-
range motions and long-range aggregations. In this paper, we propose a Temporal …

Temporal segment networks for action recognition in videos

L Wang, Y Xiong, Z Wang, Y Qiao, D Lin… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
We present a general and flexible video-level framework for learning action models in
videos. This method, called temporal segment network (TSN), aims to model long-range …

Unified spatio-temporal attention networks for action recognition in videos

D Li, T Yao, LY Duan, T Mei… - IEEE Transactions on …, 2018 - ieeexplore.ieee.org
Recognizing actions in videos is not a trivial task because video is an information-intensive
media and includes multiple modalities. Moreover, on each modality, an action may only …

SAST: Learning semantic action-aware spatial-temporal features for efficient action recognition

F Wang, G Wang, Y Huang, H Chu - IEEE Access, 2019 - ieeexplore.ieee.org
The state-of-the-arts in action recognition are suffering from three challenges:(1) How to
model spatial transformations of action since it is always geometric variation over time in …

Look more but care less in video recognition

Y Zhang, Y Bai, H Wang, Y Xu… - Advances in Neural …, 2022 - proceedings.neurips.cc
Existing action recognition methods typically sample a few frames to represent each video to
avoid the enormous computation, which often limits the recognition performance. To tackle …

Spatial–temporal pooling for action recognition in videos

J Wang, Z Shao, X Huang, T Lu, R Zhang, X Lv - Neurocomputing, 2021 - Elsevier
Recently, deep convolutional neural networks have demonstrated great effectiveness in
action recognition with both RGB and optical flow in the past decade. However, existing …

Spatio-temporal collaborative module for efficient action recognition

Y Hao, S Wang, Y Tan, X He, Z Liu… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org
Efficient action recognition aims to classify a video clip into a specific action category with a
low computational cost. It is challenging since the integrated spatial-temporal calculation …

Adafuse: Adaptive temporal fusion network for efficient action recognition

Y Meng, R Panda, CC Lin, P Sattigeri… - arXiv preprint arXiv …, 2021 - arxiv.org
Temporal modelling is the key for efficient video action recognition. While understanding
temporal information can improve recognition accuracy for dynamic actions, removing …

Stnet: Local and global spatial-temporal modeling for action recognition

D He, Z Zhou, C Gan, F Li, X Liu, Y Li, L Wang… - Proceedings of the …, 2019 - ojs.aaai.org
Despite the success of deep learning for static image understanding, it remains unclear what
are the most effective network architectures for spatial-temporal modeling in videos. In this …