CL Zhang, J Wu, Y Li - European Conference on Computer Vision, 2022 - Springer
Self-attention based Transformer models have demonstrated impressive results for image classification and object detection, and more recently for video understanding. Inspired by …
Image-based visual-language (I-VL) pre-training has shown great success for learning joint visual-textual representations from large-scale web data, revealing remarkable ability for …
D Shi, Y Zhong, Q Cao, L Ma, J Li… - Proceedings of the …, 2023 - openaccess.thecvf.com
In this paper, we present a one-stage framework TriDet for temporal action detection. Existing methods often suffer from imprecise boundary predictions due to the ambiguous …
Temporal action localization is an important yet challenging task in video understanding. Typically, such a task aims at inferring both the action category and localization of the start …
Temporal action detection (TAD) aims to determine the semantic label and the temporal interval of every action instance in an untrimmed video. It is a fundamental and challenging …
C Zhang, M Cao, D Yang, J Chen… - Proceedings of the …, 2021 - openaccess.thecvf.com
Weakly-supervised temporal action localization (WS-TAL) aims to localize actions in untrimmed videos with only video-level labels. Most existing models follow the" localization …
M Chen, J Gao, S Yang, C Xu - European conference on computer vision, 2022 - Springer
Weakly-supervised temporal action localization (WS-TAL) aims to localize the action instances and recognize their categories with only video-level labels. Despite great …
F Cheng, G Bertasius - European Conference on Computer Vision, 2022 - Springer
Most modern approaches in temporal action localization divide this problem into two parts:(i) short-term feature extraction and (ii) long-range temporal boundary localization. Due to the …
Z Zhu, W Tang, L Wang, N Zheng… - Proceedings of the …, 2021 - openaccess.thecvf.com
Effectively tackling the problem of temporal action localization (TAL) necessitates a visual representation that jointly pursues two confounding goals, ie, fine-grained discrimination for …