Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

Winner: Weakly-supervised hierarchical decomposition and alignment for spatio-temporal video grounding

M Li, H Wang, W Zhang, J Miao… - Proceedings of the …, 2023 - openaccess.thecvf.com
Spatio-temporal video grounding aims to localize the aligned visual tube corresponding to a
language query. Existing techniques achieve such alignment by exploiting dense boundary …

Rethinking weakly-supervised video temporal grounding from a game perspective

X Fang, Z Xiong, W Fang, X Qu, C Chen, J Dong… - … on Computer Vision, 2025 - Springer
This paper addresses the challenging task of weakly-supervised video temporal grounding.
Existing approaches are generally based on the moment proposal selection framework that …

Rethinking Video Sentence Grounding From a Tracking Perspective With Memory Network and Masked Attention

Z Xiong, D Liu, X Fang, X Qu, J Dong… - IEEE Transactions …, 2024 - ieeexplore.ieee.org
Video sentence grounding (VSG) is the task of identifying the segment of an untrimmed
video that semantically corresponds to a given natural language query. While many existing …

Benchmarking Audio Visual Segmentation for Long-Untrimmed Videos

C Liu, PP Li, Q Yu, H Sheng… - Proceedings of the …, 2024 - openaccess.thecvf.com
Existing audio-visual segmentation datasets typically focus on short-trimmed videos with
only one pixel-map annotation for a per-second video clip. In contrast for untrimmed videos …

Transform-Equivariant Consistency Learning for Temporal Sentence Grounding

D Liu, X Qu, J Dong, P Zhou, Z Xu, H Wang… - ACM Transactions on …, 2024 - dl.acm.org
This paper addresses the temporal sentence grounding (TSG). Although existing methods
have made decent achievements in this task, they not only severely rely on abundant video …

Dynamic Contrastive Learning with Pseudo-samples Intervention for Weakly Supervised Joint Video MR and HD

S Kong, L Li, B Zhang, W Wang, B Jiang… - Proceedings of the 31st …, 2023 - dl.acm.org
Joint video moment retrieval (MR) and highlight detection (HD) aims to find relevant video
moments according to the query text. Existing methods are fully supervised based on …

Learning to focus on the foreground for temporal sentence grounding

D Liu, W Hu - Proceedings of the 29th International Conference on …, 2022 - aclanthology.org
Temporal sentence grounding (TSG) is crucial and fundamental for video understanding.
Previous works typically model the target activity referred to the sentence query in a video by …

Tracking Objects and Activities with Attention for Temporal Sentence Grounding

Z Xiong, D Liu, P Zhou, J Zhu - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding (TSG) aims to localize the temporal segment which is
semantically aligned with a natural language query in an untrimmed video. Most existing …