$ R^ 2$-Tuning: Efficient Image-to-Video Transfer Learning for Video Temporal Grounding

Y Liu, Z Ma, Z Qi, Y Wu, Y Shan, CW Chen - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advances in Video Large Language Models (Video-LLMs) have demonstrated their
great potential in general-purpose video understanding. To verify the significance of these …

被引用次数：5 相关文章所有 4 个版本

[PDF] arxiv.org

Saliency-guided detr for moment retrieval and highlight detection

A Gordeev, V Dokholyan, I Tolstykh… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing approaches for video moment retrieval and highlight detection are not able to align
text and video features efficiently, resulting in unsatisfying performance and limited …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

FlashVTG: Feature Layering and Adaptive Score Handling Network for Video Temporal Grounding

Z Cao, B Zhang, H Du, X Yu, X Li, S Wang - arXiv preprint arXiv …, 2024 - arxiv.org

Text-guided Video Temporal Grounding (VTG) aims to localize relevant segments in
untrimmed videos based on textual descriptions, encompassing two subtasks: Moment …

相关文章所有 2 个版本

[PDF] arxiv.org

VideoLights: Feature Refinement and Cross-Task Alignment Transformer for Joint Video Highlight Detection and Moment Retrieval

D Paul, MR Parvez, N Mohammed… - arXiv preprint arXiv …, 2024 - arxiv.org

Video Highlight Detection and Moment Retrieval (HD/MR) are essential in video analysis.
Recent joint prediction transformer models often overlook their cross-task dynamics and …

相关文章所有 2 个版本

[PDF] arxiv.org

LD-DETR: Loop Decoder DEtection TRansformer for Video Moment Retrieval and Highlight Detection

P Zhao, Z He, F Zhang, S Lin, F Zhou - arXiv preprint arXiv:2501.10787, 2025 - arxiv.org

Video Moment Retrieval and Highlight Detection aim to find corresponding content in the
video based on a text query. Existing models usually first use contrastive learning methods …

[PDF] arxiv.org

Length-Aware DETR for Robust Moment Retrieval

S Park, J Choi, K Baek, H Shim - arXiv preprint arXiv:2412.20816, 2024 - arxiv.org

Video Moment Retrieval (MR) aims to localize moments within a video based on a given
natural language query. Given the prevalent use of platforms like YouTube for information …

相关文章所有 2 个版本

[PDF] arxiv.org

D&M: Enriching E-commerce Videos with Sound Effects by Key Moment Detection and SFX Matching

J Liu, M Wang, Y Ma, B Wang, A Chen, Q Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Videos showcasing specific products are increasingly important for E-commerce. Key
moments naturally exist as the first appearance of a specific product, presentation of its …

相关文章所有 2 个版本

[PDF] openreview.net

: Exploring Embodied Emotion Through A Large-Scale Egocentric Video Dataset

W Lin, Y Feng, WK Han, T Jin, Z Zhao, F Wu… - The Thirty-eight … - openreview.net

Understanding human emotions is fundamental to enhancing human-computer interaction,
especially for embodied agents that mimic human behavior. Traditional emotion analysis …

高级搜索

QQ 群