Attentive video modeling is essential for action recognition in unconstrained videos due to their rich yet redundant information over space and time. However, introducing attention in a …
Addressing the dual challenges of local redundancy and global dependencies in video understanding, this work innovatively adapts the Mamba to the video domain. The proposed …
N Wang, G Zhu, HS Li, L Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Although neural networks excel in video action recognition tasks their" black-box" nature makes it challenging to understand the rationale behind their decisions. Recent approaches …
Despite the recent success of end-to-end learned representations, hand-crafted optical flow features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a …
Motion plays a crucial role in understanding videos and most state-of-the-art neural models for video classification incorporate motion information typically using optical flows extracted …
A Arnab, C Sun, C Schmid - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Accurate video understanding involves reasoning about the relationships between actors, objects and their environment, often over long temporal intervals. In this paper, we propose …
In this paper we study 3D convolutional networks for video understanding tasks. Our starting point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception …
B Pang, K Zha, Y Zhang, C Lu - … of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org
Video understanding is a research hotspot of computer vision and significant progress has been made on video action recognition recently. However, the semantics information …
Spatial convolutions are widely used in numerous deep video models. It fundamentally assumes spatio-temporal invariance, ie, using shared weights for every location in different …