Tracking Objects and Activities with Attention for Temporal Sentence Grounding

X Fang, Z Xiong, W Fang, X Qu, C Chen, J Dong… - … on Computer Vision, 2025 - Springer

This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …

被引用次数：10 相关文章所有 4 个版本

Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention

Z Xiong, D Liu, X Fang, X Qu, J Dong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Video sentence grounding (VSG) is the task of identifying the segment of an untrimmed
video that semantically corresponds to a given natural language query. While many existing …

被引用次数：3 相关文章所有 2 个版本

[PDF] google.com

Filling the Information Gap between Video and Query for Language-Driven Moment Retrieval

D Liu, X Qu, J Dong, G Nan, P Zhou, Z Xu… - Proceedings of the 31st …, 2023 - dl.acm.org

This paper addresses the challenging task of language-driven moment retrieval. Previous
methods are typically trained to localize the target moment corresponding to a single …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

D Liu, X Qu, J Dong, P Zhou, Z Xu, H Wang… - ACM Transactions on …, 2024 - dl.acm.org

This paper addresses the temporal sentence grounding (TSG). Although existing methods
have made decent achievements in this task, they not only severely rely on abundant video …

被引用次数：9 相关文章所有 3 个版本

[PDF] github.io

Probability distribution based frame-supervised language-driven action localization

S Yang, Z Shang, X Wu - Proceedings of the 31st ACM International …, 2023 - dl.acm.org

Frame-supervised language-driven action localization aims to localize action boundaries in
untrimmed videos corresponding to the input natural language query, with only a single …

被引用次数：4 相关文章所有 3 个版本

Efficient Language-Driven Action Localization by Feature Aggregation and Prediction Adjustment

Z Shang, S Yang, X Wu - Chinese Conference on Pattern Recognition and …, 2024 - Springer

Abstract Language-driven action localization is a challenging task that aims to identify action
boundaries, namely the start and end timestamps, within untrimmed videos using natural …

高级搜索

QQ 群