Augmented transformer with adaptive graph for temporal action proposal generation

E Vahdani, Y Tian - IEEE Transactions on Pattern Analysis and …, 2022 - ieeexplore.ieee.org

Understanding human behavior and activity facilitates advancement of numerous real-world
applications, and is critical for video analysis. Despite the progress of action recognition …

被引用次数：73 相关文章所有 8 个版本

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

被引用次数：417 相关文章所有 7 个版本

[PDF] neurips.cc

Focal modulation networks

J Yang, C Li, X Dai, J Gao - Advances in Neural Information …, 2022 - proceedings.neurips.cc

We propose focal modulation networks (FocalNets in short), where self-attention (SA) is
completely replaced by a focal modulation module for modeling token interactions in vision …

被引用次数：270 相关文章所有 6 个版本

[PDF] arxiv.org

Focal self-attention for local-global interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao, L Yuan… - arXiv preprint arXiv …, 2021 - arxiv.org

Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability of capturing short-and long-range visual dependencies …

被引用次数：501 相关文章所有 2 个版本

[PDF] neurips.cc

Focal attention for long-range interactions in vision transformers

J Yang, C Li, P Zhang, X Dai, B Xiao… - Advances in Neural …, 2021 - proceedings.neurips.cc

Abstract Recently, Vision Transformer and its variants have shown great promise on various
computer vision tasks. The ability to capture local and global visual dependencies through …

被引用次数：159 相关文章所有 7 个版本

[PDF] arxiv.org

KVT: k-NN Attention for Boosting Vision Transformers

P Wang, X Wang, F Wang, M Lin, S Chang, H Li… - European conference on …, 2022 - Springer

Abstract Convolutional Neural Networks (CNNs) have dominated computer vision for years,
due to its ability in capturing locality and translation invariance. Recently, many vision …

被引用次数：120 相关文章所有 6 个版本

[PDF] arxiv.org

An efficient spatio-temporal pyramid transformer for action detection

Y Weng, Z Pan, M Han, X Chang, B Zhuang - European Conference on …, 2022 - Springer

The task of action detection aims at deducing both the action category and localization of the
start and end moment for each action instance in a long, untrimmed video. While vision …

被引用次数：37 相关文章所有 8 个版本

[PDF] thecvf.com

Stargazer: A transformer-based driver action detection system for intelligent transportation

J Liang, H Zhu, E Zhang… - Proceedings of the IEEE …, 2022 - openaccess.thecvf.com

Distracted driver actions can be dangerous and cause severe accidents. Thus, it is important
to detect and eliminate distracted driving behaviors on the road to save lives. To this end, we …

被引用次数：30 相关文章所有 4 个版本

[PDF] arxiv.org

Temporalmaxer: Maximize temporal context with only max pooling for temporal action localization

TN Tang, K Kim, K Sohn - arXiv preprint arXiv:2303.09055, 2023 - arxiv.org

Temporal Action Localization (TAL) is a challenging task in video understanding that aims to
identify and localize actions within a video sequence. Recent studies have emphasized the …

被引用次数：38 相关文章所有 2 个版本

[PDF] thecvf.com

Pat: Position-aware transformer for dense multi-label action detection

F Sardari, A Mustafa, PJB Jackson… - Proceedings of the …, 2023 - openaccess.thecvf.com

We present PAT, a transformer-based network that learns complex temporal co-occurrence
action dependencies in a video by exploiting multi-scale temporal features. In existing …

被引用次数：8 相关文章所有 7 个版本

高级搜索

QQ 群