Visually Grounded Story Generation Challenge

X Hong, K Mehra, A Sayeed… - Proceedings of the 16th …, 2023 - aclanthology.org
Recent large pre-trained models have achieved strong performance in multimodal language
generation, which requires a joint effort of vision and language modeling. However, most …

Summary of the Visually Grounded Story Generation Challenge

X Hong, A Sayeed, V Demberg - Proceedings of the 17th …, 2024 - aclanthology.org
Recent advancements in vision-and-language models have opened new possibilities for
natural language generation, particularly in generating creative stories from visual input. We …

Improving Visual Storytelling with Multimodal Large Language Models

X Lin, X Chen - arXiv preprint arXiv:2407.02586, 2024 - arxiv.org
Visual storytelling is an emerging field that combines images and narratives to create
engaging and contextually rich stories. Despite its potential, generating coherent and …

[PDF][PDF] Enhancing Visual Story Generation with Large Language and Vision-Language Models through Multimodal Learning

X Lin, X Chen - researchgate.net
Visual storytelling, which involves generating coherent and engaging narratives from a
sequence of images, is a challenging task that has garnered significant interest due to its …