MICap: A Unified Model for Identity-aware Movie Descriptions

文章

学术资源搜索

获得 3 条结果（用时0.01秒）

我的图书馆

MICap: A Unified Model for Identity-aware Movie Descriptions

在引用文章中搜索

[PDF] thecvf.com

Autoad-zero: A training-free framework for zero-shot audio description

J Xie, T Han, M Bain, A Nagrani… - Proceedings of the …, 2024 - openaccess.thecvf.com

Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a
training-free manner. We use the power of off-the-shelf Video Language Models (VLMs) and …

被引用次数：5 相关文章所有 7 个版本

[PDF] arxiv.org

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Y He, Y Lin, J Wu, H Zhang, Y Zhang, R Le - arXiv preprint arXiv …, 2024 - arxiv.org

Existing large vision-language models (LVLMs) are largely limited to processing short,
seconds-long videos and struggle with generating coherent descriptions for extended video …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

DistinctAD: Distinctive Audio Description Generation in Contexts

B Fang, W Wu, Q Wu, Y Song, AB Chan - arXiv preprint arXiv:2411.18180, 2024 - arxiv.org

Audio Descriptions (ADs) aim to provide a narration of a movie in text form, describing non-
dialogue-related narratives, such as characters, actions, or scene establishment. Automatic …

高级搜索

QQ 群

MICap: A Unified Model for Identity-aware Movie Descriptions

Autoad-zero: A training-free framework for zero-shot audio description

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

DistinctAD: Distinctive Audio Description Generation in Contexts

引用