LLaViLo: Boosting Video Moment Retrieval via Adapter-Based Multimodal Modeling

P Zhou, L Wang, Z Liu, Y Hao, P Hui, S Tarkoma… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper offers an insightful examination of how currently top-trending AI technologies, ie,
generative artificial intelligence (Generative AI) and large language models (LLMs), are …

被引用次数：31 相关文章所有 8 个版本

Correlation-guided query-dependency calibration in video representation learning for temporal grounding

WJ Moon, S Hyun, SB Lee, JP Heo - CoRR, 2023 - openreview.net

Temporal Grounding is to identify specific moments or highlights from a video corresponding
to textual descriptions. Typical approaches in temporal grounding treat all video clips …

被引用次数：33 相关文章所有 2 个版本

[PDF] arxiv.org

Spikemba: Multi-modal spiking saliency mamba for temporal video grounding

W Li, X Hong, R Xiong, X Fan - arXiv preprint arXiv:2404.01174, 2024 - arxiv.org

Temporal video grounding (TVG) is a critical task in video content understanding, requiring
precise alignment between video content and natural language instructions. Despite …

被引用次数：19 相关文章所有 2 个版本

[PDF] acm.org

Prior knowledge integration via llm encoding and pseudo event regulation for video moment retrieval

Y Jiang, W Zhang, X Zhang, XY Wei… - Proceedings of the 32nd …, 2024 - dl.acm.org

In this paper, we explore the use of large language models (LLMs) to enhance video
moment retrieval (VMR) by integrating general knowledge and pseudo-events as priors. We …

被引用次数：4 相关文章所有 4 个版本

[PDF] arxiv.org

Beyond uncertainty: Evidential deep learning for robust video temporal grounding

K Ma, H Huang, J Chen, H Chen, P Ji, X Zang… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing Video Temporal Grounding (VTG) models excel in accuracy but often overlook open-
world challenges posed by open-vocabulary queries and untrimmed videos. This leads to …

被引用次数：3 相关文章所有 4 个版本

[PDF] thecvf.com

ChatVTG: Video Temporal Grounding via Chat with Video Dialogue Large Language Models

M Qu, X Chen, W Liu, A Li… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Abstract Video Temporal Grounding (VTG) aims to ground specific segments within an
untrimmed video corresponding to the given natural language query. Existing VTG methods …

被引用次数：7 相关文章

Unsupervised Video Moment Retrieval with Knowledge-Based Pseudo-Supervision Construction

G Wang, X Wu, X Tu, Z Liu, J Yan - ACM Transactions on Information …, 2024 - dl.acm.org

Video moment retrieval locates a specified moment by a sentence query. Recent
approaches have made remarkable advancements with large-scale video-sentence …

被引用次数：1 相关文章

Sparse graph matching network for temporal language localization in videos

G Wu, T Xu, J Zhang - Computer Vision and Image Understanding, 2024 - Elsevier

Temporal language localization in videos aims to retrieve the moment that best matches the
text description in the untrimmed video using the query text. Existing methods using graph …

被引用次数：2 相关文章

[PDF] arxiv.org

Zero-shot Video Moment Retrieval via Off-the-shelf Multimodal Large Language Models

Y Xu, Y Sun, B Zhai, M Li, W Liang, Y Li… - arXiv preprint arXiv …, 2025 - arxiv.org

The target of video moment retrieval (VMR) is predicting temporal spans within a video that
semantically match a given linguistic query. Existing VMR methods based on multimodal …

VTG-GPT: Tuning-Free Zero-Shot Video Temporal Grounding with GPT

Y Xu, Y Sun, Z Xie, B Zhai, S Du - Applied Sciences, 2024 - mdpi.com

Video temporal grounding (VTG) aims to locate specific temporal segments from an
untrimmed video based on a linguistic query. Most existing VTG models are trained on …

被引用次数：5 相关文章所有 4 个版本

高级搜索

QQ 群