Bliva: A simple multimodal llm for better handling of text-rich visual questions

W Hu, Y Xu, Y Li, W Li, Z Chen, Z Tu - Proceedings of the AAAI …, 2024 - ojs.aaai.org
Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant advancements …

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

W Hu, Y Xu, Y Li, W Li, Z Chen, Z Tu - arXiv e-prints, 2023 - ui.adsabs.harvard.edu
Abstract Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant advancements …

BLIVA: A Simple Multimodal LLM for Better Handling of Text-Rich Visual Questions

W Hu, Y Xu, Y Li, W Li, Z Chen, Z Tu - arXiv preprint arXiv:2308.09936, 2023 - arxiv.org
Vision Language Models (VLMs), which extend Large Language Models (LLM) by
incorporating visual understanding capability, have demonstrated significant advancements …