Fast temporal activity proposals for efficient detection of human actions in untrimmed videos

A Haque, A Milstein, L Fei-Fei - Nature, 2020 - nature.com

Advances in machine learning and contactless sensors have given rise to ambient
intelligence—physical spaces that are sensitive and responsive to the presence of humans …

被引用次数：217 相关文章所有 9 个版本

[PDF] thecvf.com

Vid2seq: Large-scale pretraining of a visual language model for dense video captioning

A Yang, A Nagrani, PH Seo, A Miech… - Proceedings of the …, 2023 - openaccess.thecvf.com

In this work, we introduce Vid2Seq, a multi-modal single-stage dense event captioning
model pretrained on narrated videos which are readily-available at scale. The Vid2Seq …

被引用次数：141 相关文章所有 26 个版本

[PDF] arxiv.org

Actionformer: Localizing moments of actions with transformers

CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer

Self-attention based Transformer models have demonstrated impressive results for image
classification and object detection, and more recently for video understanding. Inspired by …

被引用次数：294 相关文章所有 7 个版本

TN-ZSTAD: Transferable network for zero-shot temporal activity detection

L Zhang, X Chang, J Liu, M Luo, Z Li… - … on Pattern Analysis …, 2022 - ieeexplore.ieee.org

An integral part of video analysis and surveillance is temporal activity detection, which
means to simultaneously recognize and localize activities in long untrimmed videos …

被引用次数：105 相关文章所有 6 个版本

[PDF] thecvf.com

End-to-end dense video captioning with parallel decoding

T Wang, R Zhang, Z Lu, F Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com

Dense video captioning aims to generate multiple associated captions with their temporal
locations from the video. Previous methods follow a sophisticated" localize-then-describe" …

被引用次数：165 相关文章所有 6 个版本

[PDF] arxiv.org

End-to-end temporal action detection with transformer

X Liu, Q Wang, Y Hu, X Tang, S Zhang… - IEEE Transactions on …, 2022 - ieeexplore.ieee.org

Temporal action detection (TAD) aims to determine the semantic label and the temporal
interval of every action instance in an untrimmed video. It is a fundamental and challenging …

被引用次数：191 相关文章所有 5 个版本

[PDF] thecvf.com

Bmn: Boundary-matching network for temporal action proposal generation

T Lin, X Liu, X Li, E Ding, S Wen - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com

Temporal action proposal generation is an challenging and promising task which aims to
locate temporal regions in real-world videos where action or event may occur. Current …

被引用次数：690 相关文章所有 5 个版本

[PDF] thecvf.com

G-tad: Sub-graph localization for temporal action detection

M Xu, C Zhao, DS Rojas, A Thabet… - Proceedings of the …, 2020 - openaccess.thecvf.com

Temporal action detection is a fundamental yet challenging task in video understanding.
Video context is a critical cue to effectively detect actions, but current works mainly focus on …

被引用次数：510 相关文章所有 11 个版本

[PDF] thecvf.com

Graph convolutional networks for temporal action localization

R Zeng, W Huang, M Tan, Y Rong… - Proceedings of the …, 2019 - openaccess.thecvf.com

Most state-of-the-art action localization systems process each action proposal individually,
without explicitly exploiting their relations during learning. However, the relations between …

被引用次数：563 相关文章所有 8 个版本

[PDF] thecvf.com

Relaxed transformer decoders for direct action proposal generation

J Tan, J Tang, L Wang, G Wu - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com

Temporal action proposal generation is an important and challenging task in video
understanding, which aims at detecting all temporal segments containing action instances of …

被引用次数：189 相关文章所有 6 个版本

高级搜索

QQ 群