Video transformer network

D Neimark, O Bar, M Zohar… - Proceedings of the …, 2021 - openaccess.thecvf.com
This paper presents VTN, a transformer-based framework for video recognition. Inspired by
recent developments in vision transformers, we ditch the standard approach in video action …

Temporal segment networks for action recognition in videos

L Wang, Y Xiong, Z Wang, Y Qiao, D Lin… - IEEE transactions on …, 2018 - ieeexplore.ieee.org
We present a general and flexible video-level framework for learning action models in
videos. This method, called temporal segment network (TSN), aims to model long-range …

Tdn: Temporal difference networks for efficient action recognition

L Wang, Z Tong, B Ji, G Wu - Proceedings of the IEEE/CVF …, 2021 - openaccess.thecvf.com
Temporal modeling still remains challenging for action recognition in videos. To mitigate this
issue, this paper presents a new video architecture, termed as Temporal Difference Network …

Going deeper with two-stream ConvNets for action recognition in video surveillance

Y Han, P Zhang, T Zhuo, W Huang, Y Zhang - Pattern Recognition Letters, 2018 - Elsevier
Learning by deep convolutional networks have shown an outstanding effectiveness in a
variety of vision based classification tasks, and for which, large datasets are the …

Mm-vit: Multi-modal video transformer for compressed video action recognition

J Chen, CM Ho - Proceedings of the IEEE/CVF winter …, 2022 - openaccess.thecvf.com
This paper presents a pure transformer-based approach, dubbed the Multi-Modal Video
Transformer (MM-ViT), for video action recognition. Different from other schemes which …

TEN: temporal excitation network for video action recognition

D Sun, Z He, B Luo, Z Ding - International Conference on …, 2023 - spiedigitallibrary.org
Temporal modeling has attracted the attention of a large number of researchers in the past
few years. In this work, we propose a new video architecture, termed as Temporal Excitation …

Diverse features fusion network for video-based action recognition

H Deng, J Kong, M Jiang, T Liu - Journal of Visual Communication and …, 2021 - Elsevier
The two-stream convolutional network has been proved to be one milestone in the study of
video-based action recognition. Lots of recent works modify internal structure of two-stream …

MV2Flow: Learning motion representation for fast compressed video action recognition

H Hu, W Zhou, X Li, N Yan, H Li - ACM Transactions on Multimedia …, 2020 - dl.acm.org
In video action recognition, motion is a very crucial clue, which is usually represented by
optical flow. However, optical flow is computationally expensive to obtain, which becomes …

Optimal Topology of Vision Transformer for Real-Time Video Action Recognition in an End-To-End Cloud Solution

S Sarraf, M Kabia - Machine Learning and Knowledge Extraction, 2023 - mdpi.com
This study introduces an optimal topology of vision transformers for real-time video action
recognition in a cloud-based solution. Although model performance is a key criterion for real …

Towards practical compressed video action recognition: A temporal enhanced multi-stream network

B Li, L Kong, D Zhang, X Bao… - … conference on pattern …, 2021 - ieeexplore.ieee.org
Current compressed video action recognition methods are mainly based on complete data.
However, in a real transmission scenario, the compressed video packets are usually …