Z Peng, W Wang, L Dong, Y Hao, S Huang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
Abstract We introduce Kosmos-2, a Multimodal Large Language Model (MLLM), enabling
new capabilities of perceiving object descriptions (eg, bounding boxes) and grounding text …