related:mzN4NdOLe6AJ:scholar.google.com/

Class feature pyramids for video explanation

A Stergiou, G Kapidis, G Kalliatakis… - 2019 IEEE/CVF …, 2019 - ieeexplore.ieee.org

Deep convolutional networks are widely used in video action recognition. 3D convolutions
are one prominent approach to deal with the additional time dimension. While 3D …

被引用次数：16 相关文章所有 11 个版本

[PDF] arxiv.org

Saliency tubes: Visual explanations for spatio-temporal convolutions

A Stergiou, G Kapidis, G Kalliatakis… - … conference on image …, 2019 - ieeexplore.ieee.org

Deep learning approaches have been established as the main methodology for video
classification and recognition. Recently, 3-dimensional convolutions have been used to …

被引用次数：50 相关文章所有 13 个版本

[PDF] thecvf.com

Hierarchical explanations for video action recognition

S Gulshad, T Long… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

To interpret deep neural networks, one main approach is to dissect the visual input and find
the prototypical parts responsible for the classification. However, existing methods often …

被引用次数：3 相关文章所有 5 个版本

[PDF] arxiv.org

Exploring Explainability in Video Action Recognition

A Saha, S Gupta, SK Ankireddy, K Chahine… - arXiv preprint arXiv …, 2024 - arxiv.org

Image Classification and Video Action Recognition are perhaps the two most foundational
tasks in computer vision. Consequently, explaining the inner workings of trained deep …

Explaining motion relevance for activity recognition in video deep learning models

L Hiley, A Preece, Y Hicks, S Chakraborty… - arXiv preprint arXiv …, 2020 - arxiv.org

A small subset of explainability techniques developed initially for image recognition models
has recently been applied for interpretability of 3D Convolutional Neural Network models in …

被引用次数：13 相关文章所有 2 个版本

[PDF] thecvf.com

Deep analysis of cnn-based spatio-temporal representations for action recognition

CFR Chen, R Panda… - Proceedings of the …, 2021 - openaccess.thecvf.com

In recent years, a number of approaches based on 2D or 3D convolutional neural networks
(CNN) have emerged for video action recognition, achieving state-of-the-art results on …

被引用次数：109 相关文章所有 8 个版本

[PDF] arxiv.org

Gta: Global temporal attention for video action understanding

B He, X Yang, Z Wu, H Chen, SN Lim… - arXiv preprint arXiv …, 2020 - arxiv.org

Self-attention learns pairwise interactions to model long-range dependencies, yielding great
improvements for video action recognition. In this paper, we seek a deeper understanding of …

被引用次数：28 相关文章所有 3 个版本

[PDF] thecvf.com

Spatial-temporal concept based explanation of 3d convnets

Y Ji, Y Wang, J Kato - … of the IEEE/CVF Conference on …, 2023 - openaccess.thecvf.com

Convolutional neural networks (CNNs) have shown remarkable performance on various
tasks. Despite its widespread adoption, the decision procedure of the network still lacks …

被引用次数：4 相关文章所有 6 个版本

[PDF] arxiv.org

Mixtconv: Mixed temporal convolutional kernels for efficient action recognition

K Shan, Y Wang, Z Tang, Y Chen… - 2020 25th International …, 2021 - ieeexplore.ieee.org

To efficiently extract spatiotemporal features of video for action recognition, most state-of-the-
art methods integrate 1D temporal convolutional filters into 2D CNN backbones. However …

被引用次数：7 相关文章所有 7 个版本

[PDF] arxiv.org

Weakly-supervised action localization, and action recognition using global–local attention of 3D CNN

N Yudistira, MS Kavitha, T Kurita - International Journal of Computer Vision, 2022 - Springer

Abstract 3D convolutional neural network (3D CNN) captures spatial and temporal
information on 3D data such as video sequences. However, due to the convolution and …

被引用次数：9 相关文章所有 10 个版本

高级搜索

QQ 群

Class feature pyramids for video explanation

Saliency tubes: Visual explanations for spatio-temporal convolutions

Hierarchical explanations for video action recognition

Exploring Explainability in Video Action Recognition

Explaining motion relevance for activity recognition in video deep learning models

Deep analysis of cnn-based spatio-temporal representations for action recognition

Gta: Global temporal attention for video action understanding

Spatial-temporal concept based explanation of 3d convnets

Mixtconv: Mixed temporal convolutional kernels for efficient action recognition

Weakly-supervised action localization, and action recognition using global–local attention of 3D CNN

引用