Recent advances in Video Large Language Models (Video-LLMs) have demonstrated their great potential in general-purpose video understanding. To verify the significance of these …
Narrative visualization effectively transforms data into engaging stories, making complex information accessible to a broad audience. Large models, essential for narrative …
Abstract Generating Audio Description (AD) for movies is a challenging task that requires fine-grained visual understanding and an awareness of the characters and their names …
Learning to localize temporal boundaries of procedure steps in instructional videos is challenging due to the limited availability of annotated large-scale training videos. Recent …
In this paper we propose VidLA an approach for video-language alignment at scale. There are two major limitations of previous video-language alignment approaches. First they do …
In this paper, we consider the problem of temporally aligning the video and texts from instructional videos, specifically, given a long-term video, and associated text sentences, our …
Video description generates natural language sentences that describe the subject, verb, and objects of the targeted Video. The video description has been used to help visually impaired …
CW Jo, M Wesołowska, M Wojcieszak - arXiv preprint arXiv:2411.05854, 2024 - arxiv.org
Short video platforms, such as YouTube, Instagram, or TikTok, are used by billions of users globally. These platforms expose users to harmful content, ranging from clickbait or physical …
Q Li, Y Zhou, C Ji, F Lu, J Gong, S Wang… - Proceedings of the 32nd …, 2024 - dl.acm.org
Text-video retrieval (TVR) identifies relevant videos based on textual queries. Existing methods are limited by their ability to understand and connect different modalities, resulting …