An overview of violence detection techniques: current challenges and future directions

N Mumtaz, N Ejaz, S Habib, SM Mohsin… - Artificial intelligence …, 2023 - Springer
Abstract The Big Video Data generated in today's smart cities has raised concerns from its
purposeful usage perspective, where surveillance cameras, among many others are the …

Uniformer: Unifying convolution and self-attention for visual recognition

K Li, Y Wang, J Zhang, P Gao, G Song… - … on Pattern Analysis …, 2023 - ieeexplore.ieee.org
It is a challenging task to learn discriminative representation from images and videos, due to
large local redundancy and complex global dependency in these visual data. Convolution …

Actionclip: A new paradigm for video action recognition

M Wang, J Xing, Y Liu - arXiv preprint arXiv:2109.08472, 2021 - arxiv.org
The canonical approach to video action recognition dictates a neural model to do a classic
and standard 1-of-N majority vote task. They are trained to predict a fixed set of predefined …

Tdn: Temporal difference networks for efficient action recognition

L Wang, Z Tong, B Ji, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …

Movinets: Mobile video networks for efficient video recognition

D Kondratyuk, L Yuan, Y Li, L Zhang… - Proceedings of the …, 2021 - openaccess.thecvf.com
Abstract We present Mobile Video Networks (MoViNets), a family of computation and
memory efficient video networks that can operate on streaming video for online inference …

Long movie clip classification with state-space video models

MM Islam, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern video recognition models are designed to operate on short video clips (eg, 5–
10 s in length). Thus, it is challenging to apply such models to long movie understanding …

Stand-alone inter-frame attention in video models

F Long, Z Qiu, Y Pan, T Yao, J Luo… - Proceedings of the …, 2022 - openaccess.thecvf.com
Motion, as the uniqueness of a video, has been critical to the development of video
understanding models. Modern deep learning models leverage motion by either executing …

Motion-driven visual tempo learning for video-based action recognition

Y Liu, J Yuan, Z Tu - IEEE Transactions on Image Processing, 2022 - ieeexplore.ieee.org
Action visual tempo characterizes the dynamics and the temporal scale of an action, which is
helpful to distinguish human actions that share high similarities in visual dynamics and …

Dynamic temporal filtering in video models

F Long, Z Qiu, Y Pan, T Yao, CW Ngo, T Mei - European Conference on …, 2022 - Springer
Video temporal dynamics is conventionally modeled with 3D spatial-temporal kernel or its
factorized version comprised of 2D spatial kernel and 1D temporal kernel. The modeling …

The dawn of quantum natural language processing

R Di Sipio, JH Huang, SYC Chen… - ICASSP 2022-2022 …, 2022 - ieeexplore.ieee.org
In this paper, we discuss the initial attempts at boosting understanding human language
based on deep-learning models with quantum computing. We successfully train a quantum …