Temporal sentence grounding in videos: A survey and future directions

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

Memory-guided semantic learning network for temporal sentence grounding

D Liu, X Qu, X Di, Y Cheng, Z Xu, P Zhou - Proceedings of the AAAI …, 2022 - ojs.aaai.org
Temporal sentence grounding (TSG) is crucial and fundamental for video understanding.
Although existing methods train well-designed deep networks with large amount of data, we …

Reducing the vision and language bias for temporal sentence grounding

D Liu, X Qu, W Hu - Proceedings of the 30th ACM International …, 2022 - dl.acm.org
Temporal sentence grounding (TSG) is an important yet challenging task in multimedia
information retrieval. Although previous TSG methods have achieved decent performance …

Skimming, locating, then perusing: A human-like framework for natural language video localization

D Liu, W Hu - Proceedings of the 30th ACM International Conference …, 2022 - dl.acm.org
This paper addresses the problem of natural language video localization (NLVL). Almost all
existing works follow the" only look once" framework that exploits a single model to directly …

Multi-modal cross-domain alignment network for video moment retrieval

X Fang, D Liu, P Zhou, Y Hu - IEEE Transactions on Multimedia, 2022 - ieeexplore.ieee.org
As an increasingly popular task in multimedia information retrieval, video moment retrieval
(VMR) aims to localize the target moment from an untrimmed video according to a given …

Zero-shot video grounding with pseudo query lookup and verification

Y Lu, R Quan, L Zhu, Y Yang - IEEE Transactions on Image …, 2024 - ieeexplore.ieee.org
Video grounding, the process of identifying a specific moment in an untrimmed video based
on a natural language query, has become a popular topic in video understanding. However …

Hierarchical contrast for unsupervised skeleton-based action representation learning

J Dong, S Sun, Z Liu, S Chen, B Liu… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
This paper targets unsupervised skeleton-based action representation learning and
proposes a new Hierarchical Contrast (HiCo) framework. Different from the existing …

Exploring motion and appearance information for temporal sentence grounding

D Liu, X Qu, P Zhou, Y Liu - Proceedings of the AAAI Conference on …, 2022 - ojs.aaai.org
This paper addresses temporal sentence grounding. Previous works typically solve this task
by learning frame-level video features and align them with the textual information. A major …

Hierarchical local-global transformer for temporal sentence grounding

X Fang, D Liu, P Zhou, Z Xu, R Li - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
This article studies the multimedia problem of temporal sentence grounding (TSG), which
aims to accurately determine the specific video segment in an untrimmed video according to …