CoDi-2: In-Context Interleaved and Interactive Any-to-Any Generation

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

被引用次数：40 相关文章所有 2 个版本

[PDF] arxiv.org

Exploring the frontier of vision-language models: A survey of current methodologies and future directions

A Ghosh, A Acharya, S Saha, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of
the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

LLMs Meet Multimodal Generation and Editing: A Survey

Y He, Z Liu, J Chen, Z Tian, H Liu, X Chi, R Liu… - arXiv preprint arXiv …, 2024 - arxiv.org

With the recent advancement in large language models (LLMs), there is a growing interest in
combining LLMs with multimodal learning. Previous surveys of multimodal large language …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Earthgpt: A universal multi-modal large language model for multi-sensor image comprehension in remote sensing domain

W Zhang, M Cai, T Zhang, Y Zhuang, X Mao - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-modal large language models (MLLMs) have demonstrated remarkable success in
vision and visual-language tasks within the natural image domain. Owing to the significant …

被引用次数：8 相关文章所有 2 个版本

[PDF] arxiv.org

Generative Visual Instruction Tuning

J Hernandez, R Villegas, V Ordonez - arXiv preprint arXiv:2406.11262, 2024 - arxiv.org

We propose to use machine-generated instruction-following data to improve the zero-shot
capabilities of a large multimodal model with additional support for generative and image …

[PDF] arxiv.org

Audio Flamingo: A Novel Audio Language Model with Few-Shot Learning and Dialogue Abilities

Z Kong, A Goel, R Badlani, W Ping, R Valle… - arXiv preprint arXiv …, 2024 - arxiv.org

Augmenting large language models (LLMs) to understand audio--including non-speech
sounds and non-verbal speech--is critically important for diverse real-world applications of …

被引用次数：7 相关文章所有 2 个版本

[PDF] mdpi.com

From Large Language Models to Large Multimodal Models: A Literature Review

D Huang, C Yan, Q Li, X Peng - Applied Sciences, 2024 - mdpi.com

With the deepening of research on Large Language Models (LLMs), significant progress has
been made in recent years on the development of Large Multimodal Models (LMMs), which …

[PDF] arxiv.org

高级搜索

QQ 群