Class feature pyramids for video explanation

A Stergiou, G Kapidis, G Kalliatakis… - 2019 IEEE/CVF …, 2019 - ieeexplore.ieee.org
Deep convolutional networks are widely used in video action recognition. 3D convolutions
are one prominent approach to deal with the additional time dimension. While 3D …

Saliency tubes: Visual explanations for spatio-temporal convolutions

A Stergiou, G Kapidis, G Kalliatakis… - … conference on image …, 2019 - ieeexplore.ieee.org
Deep learning approaches have been established as the main methodology for video
classification and recognition. Recently, 3-dimensional convolutions have been used to …

Hierarchical explanations for video action recognition

S Gulshad, T Long… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
To interpret deep neural networks, one main approach is to dissect the visual input and find
the prototypical parts responsible for the classification. However, existing methods often …

Exploring Explainability in Video Action Recognition

A Saha, S Gupta, SK Ankireddy, K Chahine… - arXiv preprint arXiv …, 2024 - arxiv.org
Image Classification and Video Action Recognition are perhaps the two most foundational
tasks in computer vision. Consequently, explaining the inner workings of trained deep …

Explaining motion relevance for activity recognition in video deep learning models

L Hiley, A Preece, Y Hicks, S Chakraborty… - arXiv preprint arXiv …, 2020 - arxiv.org
A small subset of explainability techniques developed initially for image recognition models
has recently been applied for interpretability of 3D Convolutional Neural Network models in …

Deep analysis of cnn-based spatio-temporal representations for action recognition

CFR Chen, R Panda… - Proceedings of the …, 2021 - openaccess.thecvf.com
In recent years, a number of approaches based on 2D or 3D convolutional neural networks
(CNN) have emerged for video action recognition, achieving state-of-the-art results on …

Gta: Global temporal attention for video action understanding

B He, X Yang, Z Wu, H Chen, SN Lim… - arXiv preprint arXiv …, 2020 - arxiv.org
Self-attention learns pairwise interactions to model long-range dependencies, yielding great
improvements for video action recognition. In this paper, we seek a deeper understanding of …

Spatial-temporal concept based explanation of 3d convnets

Y Ji, Y Wang, J Kato - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com
Convolutional neural networks (CNNs) have shown remarkable performance on various
tasks. Despite its widespread adoption, the decision procedure of the network still lacks …

Mixtconv: Mixed temporal convolutional kernels for efficient action recognition

K Shan, Y Wang, Z Tang, Y Chen… - 2020 25th International …, 2021 - ieeexplore.ieee.org
To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-
art methods integrate 1D temporal convolutional filters into 2D CNN backbones. However …

Weakly-supervised action localization, and action recognition using global–local attention of 3D CNN

N Yudistira, MS Kavitha, T Kurita - International Journal of Computer Vision, 2022 - Springer
Abstract 3D convolutional neural network (3D CNN) captures spatial and temporal
information on 3D data such as video sequences. However, due to the convolution and …