Video action understanding

MS Hutchinson, VN Gadepally - IEEE Access, 2021 - ieeexplore.ieee.org
Many believe that the successes of deep learning on image understanding problems can be
replicated in the realm of video understanding. However, due to the scale and temporal …

Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org
Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

Language Model Guided Interpretable Video Action Reasoning

N Wang, G Zhu, HS Li, L Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although neural networks excel in video action recognition tasks their" black-box" nature
makes it challenging to understand the rationale behind their decisions. Recent approaches …

End-to-end learning of motion representation for video understanding

L Fan, W Huang, C Gan, S Ermon… - Proceedings of the …, 2018 - openaccess.thecvf.com
Despite the recent success of end-to-end learned representations, hand-crafted optical flow
features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a …

Motionsqueeze: Neural motion feature learning for video understanding

H Kwon, M Kim, S Kwak, M Cho - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer
Motion plays a crucial role in understanding videos and most state-of-the-art neural models
for video classification incorporate motion information typically using optical flows extracted …

Unified graph structured models for video understanding

A Arnab, C Sun, C Schmid - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Accurate video understanding involves reasoning about the relationships between actors,
objects and their environment, often over long temporal intervals. In this paper, we propose …

[PDF][PDF] Rethinking spatiotemporal feature learning for video understanding

S Xie, C Sun, J Huang, Z Tu, K Murphy - arXiv preprint arXiv …, 2017 - chensun.me
In this paper we study 3D convolutional networks for video understanding tasks. Our starting
point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception …

Further understanding videos through adverbs: A new video task

B Pang, K Zha, Y Zhang, C Lu - … of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
Video understanding is a research hotspot of computer vision and significant progress has
been made on video action recognition recently. However, the semantics information …

Tada! temporally-adaptive convolutions for video understanding

Z Huang, S Zhang, L Pan, Z Qing, M Tang, Z Liu… - arXiv preprint arXiv …, 2021 - arxiv.org
Spatial convolutions are widely used in numerous deep video models. It fundamentally
assumes spatio-temporal invariance, ie, using shared weights for every location in different …