相关文章- 学术资源搜索

Scenes-objects-actions: A multi-task, multi-label video dataset

J Ray, H Wang, D Tran, Y Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com

This paper introduces a large-scale, multi-label and multitask video dataset named Scenes-
Objects-Actions (SOA). Most prior video datasets are based on a predened taxonomy, which …

被引用次数：36 相关文章所有 5 个版本

[PDF] arxiv.org

Mimetics: Towards understanding human actions out of context

P Weinzaepfel, G Rogez - International Journal of Computer Vision, 2021 - Springer

Recent methods for video action recognition have reached outstanding performances on
existing benchmarks. However, they tend to leverage context such as scenes or objects …

被引用次数：69 相关文章所有 10 个版本

[PDF] thecvf.com

Contextual action recognition with r* cnn

G Gkioxari, R Girshick, J Malik - Proceedings of the IEEE …, 2015 - openaccess.thecvf.com

There are multiple cues in an image which reveal what action a person is performing. For
example, a jogger has a pose that is characteristic for jogging, but the scene (eg road, trail) …

被引用次数：518 相关文章所有 13 个版本

[PDF] arxiv.org

Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org

Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

被引用次数：23 相关文章所有 3 个版本

[PDF] thecvf.com

Modality-Collaborative Test-Time Adaptation for Action Recognition

B Xiong, X Yang, Y Song, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Video-based Unsupervised Domain Adaptation (VUDA) method improves the
generalization of the video model enabling it to be applied to action recognition tasks in …

[PDF] thecvf.com

Deep analysis of cnn-based spatio-temporal representations for action recognition

CFR Chen, R Panda… - Proceedings of the …, 2021 - openaccess.thecvf.com

In recent years, a number of approaches based on 2D or 3D convolutional neural networks
(CNN) have emerged for video action recognition, achieving state-of-the-art results on …

被引用次数：109 相关文章所有 8 个版本

[PDF] thecvf.com

Stm: Spatiotemporal and motion encoding for action recognition

B Jiang, MM Wang, W Gan, W Wu… - Proceedings of the …, 2019 - openaccess.thecvf.com

Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …

被引用次数：482 相关文章所有 6 个版本

Spatial–temporal pooling for action recognition in videos

J Wang, Z Shao, X Huang, T Lu, R Zhang, X Lv - Neurocomputing, 2021 - Elsevier

Recently, deep convolutional neural networks have demonstrated great effectiveness in
action recognition with both RGB and optical flow in the past decade. However, existing …

被引用次数：30 相关文章

[PDF] thecvf.com

Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns

L Wang, P Koniusz, DQ Huynh - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

In this paper, we revive the use of old-fashioned handcrafted video representations for
action recognition and put new life into these techniques via a CNN-based hallucination …

被引用次数：105 相关文章所有 9 个版本

[PDF] oulu.fi

Vision-based multi-modal framework for action recognition

BD Romaissa, O Mourad… - 2020 25th International …, 2021 - ieeexplore.ieee.org

Human activity recognition plays a central role in the development of intelligent systems for
video surveillance, public security, health care and home monitoring, where detection and …

被引用次数：12 相关文章所有 7 个版本

高级搜索

QQ 群

Scenes-objects-actions: A multi-task, multi-label video dataset

Mimetics: Towards understanding human actions out of context

Contextual action recognition with r* cnn

Knowing what, where and when to look: Efficient video action modeling with attention

Modality-Collaborative Test-Time Adaptation for Action Recognition

Deep analysis of cnn-based spatio-temporal representations for action recognition

Stm: Spatiotemporal and motion encoding for action recognition

Spatial–temporal pooling for action recognition in videos

Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns

Vision-based multi-modal framework for action recognition

引用