Kosmos-2: Grounding multimodal large language models to the world

Z Peng, W Wang, L Dong, Y Hao, S Huang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling new
capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text to the …

Kosmos-2: Grounding Multimodal Large Language Models to the World

Z Peng, W Wang, L Dong, Y Hao, S Huang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Abstract We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling
new capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text …