LLMs Meet Multimodal Generation and Editing: A Survey

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

LLMs Meet Multimodal Generation and Editing: A Survey

在引用文章中搜索

[PDF] arxiv.org

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

Z Tian, Z Liu, R Yuan, J Pan, X Huang, Q Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

In this work, we systematically study music generation conditioned solely on the video. First,
we present a large-scale dataset comprising 190K video-music pairs, including various …

From Efficient Multimodal Models to World Models: A Survey

X Mai, Z Tao, J Lin, H Wang, Y Chang, Y Kang… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Models (MLMs) are becoming a significant research focus, combining
powerful large language models with multimodal learning to perform complex tasks across …

高级搜索

QQ 群

LLMs Meet Multimodal Generation and Editing: A Survey

VidMuse: A Simple Video-to-Music Generation Framework with Long-Short-Term Modeling

From Efficient Multimodal Models to World Models: A Survey

引用