Dynamic normalization and relay for video action recognition

D Cai, A Yao, Y Chen - Advances in neural information …, 2021 - proceedings.neurips.cc
Abstract Convolutional Neural Networks (CNNs) have been the dominant model for video
action recognition. Due to the huge memory and compute demand, popular action …

More is less: Learning efficient video representations by big-little network and depthwise temporal aggregation

Q Fan, CFR Chen, H Kuehne… - Advances in Neural …, 2019 - proceedings.neurips.cc
Current state-of-the-art models for video action recognition are mostly based on expensive
3D ConvNets. This results in a need for large GPU clusters to train and evaluate such …

Deep analysis of cnn-based spatio-temporal representations for action recognition

CFR Chen, R Panda… - Proceedings of the …, 2021 - openaccess.thecvf.com
In recent years, a number of approaches based on 2D or 3D convolutional neural networks
(CNN) have emerged for video action recognition, achieving state-of-the-art results on …

A large-scale robustness analysis of video action recognition models

MC Schiappa, N Biyani, P Kamtam… - Proceedings of the …, 2023 - openaccess.thecvf.com
We have seen great progress in video action recognition in recent years. There are several
models based on convolutional neural network (CNN) and some recent transformer based …

Mitigating representation bias in action recognition: Algorithms and benchmarks

H Duan, Y Zhao, K Chen, Y Xiong, D Lin - European Conference on …, 2022 - Springer
Deep learning models have achieved excellent recognition results on large-scale video
benchmarks. However, they perform poorly when applied to videos with rare scenes or …

Mfi: Multi-range feature interchange for video action recognition

S Bai, Q Wang, X Li - 2020 25th International Conference on …, 2021 - ieeexplore.ieee.org
Short-range motion features and long-range dependencies are two complementary and vital
cues for action recognition in videos, but it remains unclear how to efficiently and effectively …

Gate-shift networks for video action recognition

S Sudhakaran, S Escalera… - Proceedings of the IEEE …, 2020 - openaccess.thecvf.com
Deep 3D CNNs for video action recognition are designed to learn powerful representations
in the joint spatio-temporal feature space. In practice however, because of the large number …

Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition

U Ahsan, R Madhok, I Essa - 2019 IEEE Winter Conference on …, 2019 - ieeexplore.ieee.org
We propose a self-supervised learning method to jointly reason about spatial and temporal
context for video recognition. Recent self-supervised approaches have used spatial context …

Multi-task learning of generalizable representations for video action recognition

Z Yao, Y Wang, M Long, J Wang… - … on Multimedia and …, 2020 - ieeexplore.ieee.org
In classic video action recognition, labels may not contain enough information about the
diverse video appearance and dynamics, thus, existing models that are trained under the …

Stm: Spatiotemporal and motion encoding for action recognition

B Jiang, MM Wang, W Gan, W Wu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …