Autoad-zero: A training-free framework for zero-shot audio description

J Xie, T Han, M Bain, A Nagrani… - Proceedings of the …, 2024 - openaccess.thecvf.com
Our objective is to generate Audio Descriptions (ADs) for both movies and TV series in a
training-free manner. We use the power of off-the-shelf Video Language Models (VLMs) and …

StoryTeller: Improving Long Video Description through Global Audio-Visual Character Identification

Y He, Y Lin, J Wu, H Zhang, Y Zhang, R Le - arXiv preprint arXiv …, 2024 - arxiv.org
Existing large vision-language models (LVLMs) are largely limited to processing short,
seconds-long videos and struggle with generating coherent descriptions for extended video …

DistinctAD: Distinctive Audio Description Generation in Contexts

B Fang, W Wu, Q Wu, Y Song, AB Chan - arXiv preprint arXiv:2411.18180, 2024 - arxiv.org
Audio Descriptions (ADs) aim to provide a narration of a movie in text form, describing non-
dialogue-related narratives, such as characters, actions, or scene establishment. Automatic …