所有版本 - 学术资源搜索

文章

学术资源搜索

获得 2 条结果（用时0.01秒）

Kosmos-2: Grounding multimodal large language models to the world

Z Peng, W Wang, L Dong, Y Hao, S Huang… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new
capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text to the …

被引用次数：314 相关文章

Kosmos-2: Grounding Multimodal Large Language Models to the World

Z Peng, W Wang, L Dong, Y Hao, S Huang… - arXiv e …, 2023 - ui.adsabs.harvard.edu

Abstract We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling
new capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text …

高级搜索

QQ 群

Kosmos-2: Grounding multimodal large language models to the world

Kosmos-2: Grounding Multimodal Large Language Models to the World

引用