Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models

文章

学术资源搜索

获得 3 条结果（用时0.01秒）

我的图书馆

Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models

在引用文章中搜索

[PDF] arxiv.org

Mova: Adapting mixture of vision experts to multimodal context

Z Zong, B Ma, D Shen, G Song, H Shao, D Jiang… - arXiv preprint arXiv …, 2024 - arxiv.org

As the key component in multimodal large language models (MLLMs), the ability of the
visual encoder greatly affects MLLM's understanding on diverse image content. Although …

被引用次数：3 相关文章所有 2 个版本

[PDF] arxiv.org

F-LMM: Grounding Frozen Large Multimodal Models

S Wu, S Jin, W Zhang, L Xu, W Liu, W Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Endowing Large Multimodal Models (LMMs) with visual grounding capability can
significantly enhance AIs' understanding of the visual world and their interaction with …

Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

Q Zhou, R Zhou, Z Hu, P Lu, S Gao, Y Zhang - arXiv preprint arXiv …, 2024 - arxiv.org

Recent advancements in Chain-of-Thought (CoT) and related rationale-based works have
significantly improved the performance of Large Language Models (LLMs) in complex …

高级搜索

QQ 群

Visual cot: Unleashing chain-of-thought reasoning in multi-modal language models

Mova: Adapting mixture of vision experts to multimodal context

F-LMM: Grounding Frozen Large Multimodal Models

Image-of-Thought Prompting for Visual Reasoning Refinement in Multimodal Large Language Models

引用