Ferret: Refer and ground anything anywhere at any granularity

H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of
understanding spatial referring of any shape or granularity within an image and accurately …

Ferret: Refer and Ground Anything Anywhere at Any Granularity

H You, H Zhang, Z Gan, X Du, B Zhang, Z Wang… - The Twelfth International … - openreview.net
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of
understanding spatial referring of any shape or granularity within an image and accurately …

Ferret: Refer and Ground Anything Anywhere at Any Granularity

H You, H Zhang, Z Gan, X Du, B Zhang… - arXiv e …, 2023 - ui.adsabs.harvard.edu
We introduce Ferret, a new Multimodal Large Language Model (MLLM) capable of
understanding spatial referring of any shape or granularity within an image and accurately …