Scenes-objects-actions: A multi-task, multi-label video dataset

J Ray, H Wang, D Tran, Y Wang… - Proceedings of the …, 2018 - openaccess.thecvf.com
This paper introduces a large-scale, multi-label and multitask video dataset named Scenes-
Objects-Actions (SOA). Most prior video datasets are based on a predened taxonomy, which …

Mimetics: Towards understanding human actions out of context

P Weinzaepfel, G Rogez - International Journal of Computer Vision, 2021 - Springer
Recent methods for video action recognition have reached outstanding performances on
existing benchmarks. However, they tend to leverage context such as scenes or objects …

Contextual action recognition with r* cnn

G Gkioxari, R Girshick, J Malik - Proceedings of the IEEE …, 2015 - openaccess.thecvf.com
There are multiple cues in an image which reveal what action a person is performing. For
example, a jogger has a pose that is characteristic for jogging, but the scene (eg road, trail) …

Knowing what, where and when to look: Efficient video action modeling with attention

JM Perez-Rua, B Martinez, X Zhu, A Toisoul… - arXiv preprint arXiv …, 2020 - arxiv.org
Attentive video modeling is essential for action recognition in unconstrained videos due to
their rich yet redundant information over space and time. However, introducing attention in a …

Modality-Collaborative Test-Time Adaptation for Action Recognition

B Xiong, X Yang, Y Song, Y Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract Video-based Unsupervised Domain Adaptation (VUDA) method improves the
generalization of the video model enabling it to be applied to action recognition tasks in …

Deep analysis of cnn-based spatio-temporal representations for action recognition

CFR Chen, R Panda… - Proceedings of the …, 2021 - openaccess.thecvf.com
In recent years, a number of approaches based on 2D or 3D convolutional neural networks
(CNN) have emerged for video action recognition, achieving state-of-the-art results on …

Stm: Spatiotemporal and motion encoding for action recognition

B Jiang, MM Wang, W Gan, W Wu… - Proceedings of the …, 2019 - openaccess.thecvf.com
Spatiotemporal and motion features are two complementary and crucial information for
video action recognition. Recent state-of-the-art methods adopt a 3D CNN stream to learn …

Spatial–temporal pooling for action recognition in videos

J Wang, Z Shao, X Huang, T Lu, R Zhang, X Lv - Neurocomputing, 2021 - Elsevier
Recently, deep convolutional neural networks have demonstrated great effectiveness in
action recognition with both RGB and optical flow in the past decade. However, existing …

Hallucinating idt descriptors and i3d optical flow features for action recognition with cnns

L Wang, P Koniusz, DQ Huynh - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
In this paper, we revive the use of old-fashioned handcrafted video representations for
action recognition and put new life into these techniques via a CNN-based hallucination …

Vision-based multi-modal framework for action recognition

BD Romaissa, O Mourad… - 2020 25th International …, 2021 - ieeexplore.ieee.org
Human activity recognition plays a central role in the development of intelligent systems for
video surveillance, public security, health care and home monitoring, where detection and …