Smallbignet: Integrating core and contextual views for video classification

N Mumtaz, N Ejaz, S Habib, SM Mohsin… - Artificial intelligence …, 2023 - Springer

Abstract The Big Video Data generated in today's smart cities has raised concerns from its
purposeful usage perspective, where surveillance cameras, among many others are the …

被引用次数：39 相关文章所有 9 个版本

[PDF] arxiv.org

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org

It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

被引用次数：405 相关文章所有 6 个版本

[PDF] arxiv.org

Actionclip: A new paradigm for video action recognition

M Wang, J Xing, Y Liu - arXiv preprint arXiv:2109.08472, 2021 - arxiv.org

The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

被引用次数：450 相关文章所有 2 个版本

[PDF] thecvf.com

Tdn: Temporal difference networks for efficient action recognition

L Wang, Z Tong, B Ji, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com

Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …

被引用次数：500 相关文章所有 8 个版本

[PDF] thecvf.com

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com

Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

被引用次数：297 相关文章所有 8 个版本

[PDF] arxiv.org

Long movie clip classification with state-space video models

MM Islam, G Bertasius - European Conference on Computer Vision, 2022 - Springer

Most modern video recognition models are designed to operate on short video clips (eg, 5–
10 s in length). Thus, it is challenging to apply such models to long movie understanding …

被引用次数：96 相关文章所有 3 个版本

[PDF] thecvf.com

Stand-alone inter-frame attention in video models

F Long, Z Qiu, Y Pan, T Yao, J Luo… - Proceedings of the …, 2022 - openaccess.thecvf.com

Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …

被引用次数：62 相关文章所有 5 个版本

[PDF] arxiv.org

Motion-driven visual tempo learning for video-based action recognition

Y Liu, J Yuan, Z Tu - IEEE Transactions on Image Processing, 2022 - ieeexplore.ieee.org

Action visual tempo characterizes the dynamics and the temporal scale of an action, which is
helpful to distinguish human actions that share high similarities in visual dynamics and …

被引用次数：69 相关文章所有 7 个版本

[PDF] arxiv.org

Dynamic temporal filtering in video models

F Long, Z Qiu, Y Pan, T Yao, CW Ngo, T Mei - European Conference on …, 2022 - Springer

Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its
factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling …

被引用次数：28 相关文章所有 7 个版本

[PDF] arxiv.org

The dawn of quantum natural language processing

R Di Sipio, JH Huang, SYC Chen… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org

In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …

被引用次数：103 相关文章所有 6 个版本

高级搜索

QQ 群