Explore inter-contrast between videos via composition for weakly supervised temporal sentence...

H Zhang, A Sun, W Jing, JT Zhou - IEEE Transactions on …, 2023 - ieeexplore.ieee.org

Temporal sentence grounding in videos (TSGV), aka, natural language video localization
(NLVL) or video moment retrieval (VMR), aims to retrieve a temporal moment that …

被引用次数：28 相关文章所有 8 个版本

[PDF] thecvf.com

You can ground earlier than see: An effective and efficient pipeline for temporal sentence grounding in compressed videos

X Fang, D Liu, P Zhou, G Nan - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

Given an untrimmed video, temporal sentence grounding (TSG) aims to locate a target
moment semantically according to a sentence query. Although previous respectable works …

被引用次数：27 相关文章所有 7 个版本

[PDF] aaai.org

Hypotheses tree building for one-shot temporal sentence localization

D Liu, X Fang, P Zhou, X Di, W Lu… - Proceedings of the AAAI …, 2023 - ojs.aaai.org

Given an untrimmed video, temporal sentence localization (TSL) aims to localize a specific
segment according to a given sentence query. Though respectable works have made …

被引用次数：12 相关文章所有 6 个版本

[PDF] arxiv.org

Faster video moment retrieval with point-level supervision

X Jiang, Z Zhou, X Xu, Y Yang, G Wang… - Proceedings of the 31st …, 2023 - dl.acm.org

Video Moment Retrieval (VMR) aims at retrieving the most relevant events from an
untrimmed video with natural language queries. Existing VMR methods suffer from two …

被引用次数：6 相关文章所有 3 个版本

[PDF] thecvf.com

Siamese Learning with Joint Alignment and Regression for Weakly-Supervised Video Paragraph Grounding

C Tan, J Lai, WS Zheng, JF Hu - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Video Paragraph Grounding (VPG) is an emerging task in video-language
understanding which aims at localizing multiple sentences with semantic relations and …

Learning temporal sentence grounding from narrated egovideos

K Flanagan, D Damen, M Wray - arXiv preprint arXiv:2310.17395, 2023 - arxiv.org

The onset of long-form egocentric datasets such as Ego4D and EPIC-Kitchens presents a
new challenge for the task of Temporal Sentence Grounding (TSG). Compared to traditional …

被引用次数：2 相关文章所有 3 个版本

[PDF] arxiv.org

Learning point-language hierarchical alignment for 3D visual grounding

J Chen, W Luo, R Song, X Wei, L Ma… - arXiv preprint arXiv …, 2022 - arxiv.org

This paper presents a novel hierarchical alignment model (HAM) that learns multi-
granularity visual and linguistic representations in an end-to-end manner. We extract key …

被引用次数：4 相关文章所有 3 个版本

[PDF] aaai.org

Gaussian Mixture Proposals with Pull-Push Learning Scheme to Capture Diverse Events for Weakly Supervised Temporal Video Grounding

S Kim, J Cho, J Yu, YJ Yoo, JY Choi - Proceedings of the AAAI …, 2024 - ojs.aaai.org

In the weakly supervised temporal video grounding study, previous methods use
predetermined single Gaussian proposals which lack the ability to express diverse events …

EtC: Temporal Boundary Expand then Clarify for Weakly Supervised Video Grounding with Multimodal Large Language Model

G Li, X Ding, D Cheng, J Li, N Wang, X Gao - arXiv preprint arXiv …, 2023 - arxiv.org

Early weakly supervised video grounding (WSVG) methods often struggle with incomplete
boundary detection due to the absence of temporal boundary annotations. To bridge the gap …

被引用次数：1 相关文章所有 2 个版本

[PDF] researchgate.net

Zero-Shot Video Moment Retrieval Using BLIP-Based Models

JI Wattasseril, S Shekhar, J Döllner, M Trapp - International Symposium on …, 2023 - Springer

Abstract Video Moment Retrieval (VMR) is a challenging task at the intersection of vision
and language, with the goal to retrieve relevant moments from videos corresponding to …

被引用次数：1 相关文章所有 3 个版本

高级搜索

QQ 群