J Dong, X Peng,
D Liu,
X Qu, X Yang, C Bao… - The Thirty-eighth Annual … - openreview.net
As a widely explored multi-modal task, Temporal Sentence Grounding in videos (TSG)
endeavors to retrieve a specific video segment matched with a given query text from a video …