Deep learning-based action detection in untrimmed videos: A survey

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org
Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer
Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc
We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

Focal self-attention for local-global interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan… - arXiv preprint arXiv …, 2021 - arxiv.org
Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …

Focal attention for long-range interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao… - Advances in Neural …, 2021 - proceedings.neurips.cc
Abstract Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability to capture local and global visual dependencies through …

KVT: k-NN Attention for Boosting Vision Transformers

P Wang, X Wang, F Wang, M Lin, S Chang, H Li… - European conference on …, 2022 - Springer
Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …

An efficient spatio-temporal pyramid transformer for action detection

Y Weng, Z Pan, M Han, X Chang, B Zhuang - European Conference on …, 2022 - Springer
The task of action detection aims at deducing both the action category and localization of the
start and end moment for each action instance in a long, untrimmed video. While vision …

Stargazer: A transformer-based driver action detection system for intelligent transportation

J Liang, H Zhu, E Zhang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com
Distracted driver actions can be dangerous and cause severe accidents. Thus, it is important
to detect and eliminate distracted driving behaviors on the road to save lives. To this end, we …

Temporalmaxer: Maximize temporal context with only max pooling for temporal action localization

TN Tang, K Kim, K Sohn - arXiv preprint arXiv:2303.09055, 2023 - arxiv.org
Temporal Action Localization (TAL) is a challenging task in video understanding that aims to
identify and localize actions within a video sequence. Recent studies have emphasized the …

Pat: Position-aware transformer for dense multi-label action detection

F Sardari, A Mustafa, PJB Jackson… - Proceedings of the …, 2023 - openaccess.thecvf.com
We present PAT, a transformer-based network that learns complex temporal co-occurrence
action dependencies in a video by exploiting multi-scale temporal features. In existing …