Home action genome: Cooperative compositional action understanding

N Rai, H Chen, J Ji, R Desai… - Proceedings of the …, 2021 - openaccess.thecvf.com
Existing research on action recognition treats activities as monolithic events occurring in
videos. Recently, the benefits of formulating actions as a combination of atomic-actions have …

Action2vec: A crossmodal embedding approach to action learning

M Hahn, A Silva, JM Rehg - arXiv preprint arXiv:1901.00484, 2019 - arxiv.org
We describe a novel cross-modal embedding space for actions, named Action2Vec, which
combines linguistic cues from class labels with spatio-temporal features derived from video …

Stnet: Local and global spatial-temporal modeling for action recognition

D He, Z Zhou, C Gan, F Li, X Liu, Y Li, L Wang… - Proceedings of the …, 2019 - ojs.aaai.org
Despite the success of deep learning for static image understanding, it remains unclear what
are the most effective network architectures for spatial-temporal modeling in videos. In this …

Recur, attend or convolve? on whether temporal modeling matters for cross-domain robustness in action recognition

S Broomé, E Pokropek, B Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
Most action recognition models today are highly parameterized, and evaluated on datasets
with appearance-wise distinct classes. It has also been shown that 2D Convolutional Neural …

With a little help from my temporal context: Multimodal egocentric action recognition

E Kazakos, J Huh, A Nagrani, A Zisserman… - arXiv preprint arXiv …, 2021 - arxiv.org
In egocentric videos, actions occur in quick succession. We capitalise on the action's
temporal context and propose a method that learns to attend to surrounding actions in order …

Harnessing lab knowledge for real-world action recognition

Z Ma, Y Yang, F Nie, N Sebe, S Yan… - International Journal of …, 2014 - Springer
Much research on human action recognition has been oriented toward the performance gain
on lab-collected datasets. Yet real-world videos are more diverse, with more complicated …

Video jigsaw: Unsupervised learning of spatiotemporal context for video action recognition

U Ahsan, R Madhok, I Essa - 2019 IEEE Winter Conference on …, 2019 - ieeexplore.ieee.org
We propose a self-supervised learning method to jointly reason about spatial and temporal
context for video recognition. Recent self-supervised approaches have used spatial context …

Sympathy for the details: Dense trajectories and hybrid classification architectures for action recognition

CR De Souza, A Gaidon, E Vig, AM López - Computer Vision–ECCV 2016 …, 2016 - Springer
Action recognition in videos is a challenging task due to the complexity of the spatio-
temporal patterns to model and the difficulty to acquire and learn on large quantities of video …

Ean: event adaptive network for enhanced action recognition

Y Tian, Y Yan, G Zhai, G Guo, Z Gao - International Journal of Computer …, 2022 - Springer
Efficiently modeling spatial–temporal information in videos is crucial for action recognition.
To achieve this goal, state-of-the-art methods typically employ the convolution operator and …

Soar: Scene-debiasing open-set action recognition

Y Zhai, Z Liu, Z Wu, Y Wu, C Zhou… - Proceedings of the …, 2023 - openaccess.thecvf.com
Deep models have the risk of utilizing spurious clues to make predictions, eg, recognizing
actions via classifying the background scene. This problem severely degrades the open-set …