Gaussian kernel-based cross modal network for spatio-temporal video grounding

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：53 相关文章所有 8 个版本

[PDF] thecvf.com

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com

Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

被引用次数：34 相关文章所有 3 个版本

[PDF] pkwyx.com

Rethinking weakly-supervised video temporal grounding from a game perspective

X Fang, Z Xiong, W Fang, X Qu, C Chen, J Dong… - … on Computer Vision, 2025 - Springer

This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …

被引用次数：10 相关文章所有 4 个版本

Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention

Z Xiong, D Liu, X Fang, X Qu, J Dong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org

Video sentence grounding (VSG) is the task of identifying the segment of an untrimmed
video that semantically corresponds to a given natural language query. While many existing …

被引用次数：3 相关文章所有 2 个版本

[PDF] thecvf.com

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

C Liu, PP Li, Q Yu, H Sheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Existing audio-visual segmentation datasets typically focus on short-trimmed videos with
only one pixel-map annotation for a per-second video clip. In contrast for untrimmed videos …

被引用次数：1 相关文章

[PDF] arxiv.org

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

D Liu, X Qu, J Dong, P Zhou, Z Xu, H Wang… - ACM Transactions on …, 2024 - dl.acm.org

This paper addresses the temporal sentence grounding (TSG). Although existing methods
have made decent achievements in this task, they not only severely rely on abundant video …

被引用次数：9 相关文章所有 3 个版本

[PDF] acm.org

Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD

S Kong, L Li, B Zhang, W Wang, B Jiang… - Proceedings of the 31st …, 2023 - dl.acm.org

Joint video moment retrieval (MR) and highlight detection (HD) aims to find relevant video
moments according to the query text. Existing methods are fully supervised based on …

被引用次数：5 相关文章

[PDF] aclanthology.org

Learning to focus on the foreground for temporal sentence grounding

D Liu, W Hu - Proceedings of the 29th International Conference on …, 2022 - aclanthology.org

Temporal sentence grounding (TSG) is crucial and fundamental for video understanding.
Previous works typically model the target activity referred to the sentence query in a video by …

被引用次数：9 相关文章

[PDF] arxiv.org

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

Z Xiong, D Liu, P Zhou, J Zhu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding (TSG) aims to localize the temporal segment which is
semantically aligned with a natural language query in an untrimmed video. Most existing …

被引用次数：6 相关文章所有 3 个版本

高级搜索

QQ 群