相关文章- 学术资源搜索

Video action understanding

MS Hutchinson, VN Gadepally - IEEE Access, 2021 - ieeexplore.ieee.org

Many believe that the successes of deep learning on image understanding problems can be
replicated in the realm of video understanding. However, due to the scale and temporal …

被引用次数：35 相关文章所有 7 个版本

[PDF] arxiv.org

Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org

Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

被引用次数：23 相关文章所有 3 个版本

[PDF] arxiv.org

Videomamba: State space model for efficient video understanding

K Li, X Li, Y Wang, Y He, Y Wang, L Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Addressing the dual challenges of local redundancy and global dependencies in video
understanding, this work innovatively adapts the Mamba to the video domain. The proposed …

被引用次数：50 相关文章所有 2 个版本

[PDF] thecvf.com

Language Model Guided Interpretable Video Action Reasoning

N Wang, G Zhu, HS Li, L Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Although neural networks excel in video action recognition tasks their" black-box" nature
makes it challenging to understand the rationale behind their decisions. Recent approaches …

End-to-end learning of motion representation for video understanding

L Fan, W Huang, C Gan, S Ermon… - Proceedings of the …, 2018 - openaccess.thecvf.com

Despite the recent success of end-to-end learned representations, hand-crafted optical flow
features are still widely used in video analysis tasks. To fill this gap, we propose TVNet, a …

被引用次数：247 相关文章所有 10 个版本

[PDF] arxiv.org

Motionsqueeze: Neural motion feature learning for video understanding

H Kwon, M Kim, S Kwak, M Cho - … , Glasgow, UK, August 23–28, 2020 …, 2020 - Springer

Motion plays a crucial role in understanding videos and most state-of-the-art neural models
for video classification incorporate motion information typically using optical flows extracted …

被引用次数：146 相关文章所有 6 个版本

[PDF] thecvf.com

Unified graph structured models for video understanding

A Arnab, C Sun, C Schmid - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Accurate video understanding involves reasoning about the relationships between actors,
objects and their environment, often over long temporal intervals. In this paper, we propose …

被引用次数：43 相关文章所有 6 个版本

[PDF] chensun.me

[PDF][PDF] Rethinking spatiotemporal feature learning for video understanding

S Xie, C Sun, J Huang, Z Tu, K Murphy - arXiv preprint arXiv …, 2017 - chensun.me

In this paper we study 3D convolutional networks for video understanding tasks. Our starting
point is the stateof-the-art I3D model of [3], which “inflates” all the 2D filters of the Inception …

被引用次数：290 相关文章

[PDF] aaai.org

Further understanding videos through adverbs: A new video task

B Pang, K Zha, Y Zhang, C Lu - … of the AAAI Conference on Artificial …, 2020 - ojs.aaai.org

Video understanding is a research hotspot of computer vision and significant progress has
been made on video action recognition recently. However, the semantics information …

被引用次数：16 相关文章所有 5 个版本

[PDF] arxiv.org

Tada! temporally-adaptive convolutions for video understanding

Z Huang, S Zhang, L Pan, Z Qing, M Tang, Z Liu… - arXiv preprint arXiv …, 2021 - arxiv.org

Spatial convolutions are widely used in numerous deep video models. It fundamentally
assumes spatio-temporal invariance, ie, using shared weights for every location in different …

被引用次数：50 相关文章所有 4 个版本

高级搜索

QQ 群

Video action understanding

Knowing what, where and when to look: Efficient video action modeling with attention

Videomamba: State space model for efficient video understanding

Language Model Guided Interpretable Video Action Reasoning

End-to-end learning of motion representation for video understanding

Motionsqueeze: Neural motion feature learning for video understanding

Unified graph structured models for video understanding

[PDF][PDF] Rethinking spatiotemporal feature learning for video understanding

Further understanding videos through adverbs: A new video task

Tada! temporally-adaptive convolutions for video understanding

相关搜索

引用