Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc
… multimodal language-image instruction-following data. By instruction tuning on such generated
… for general-purpose visual and language understanding. To facilitate future research on …

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
… rethinking the conventional approaches and exploring the open problems in visual instruction
tuning, we pave the way for more robust and capable systems for LMMs. We hope these …

Comparison Visual Instruction Tuning

W Lin, MJ Mirza, S Doveh, R Feris, R Giryes… - arXiv preprint arXiv …, 2024 - arxiv.org
… the best available mimic of human visual intelligence to date. While multiple methods … tuning
using Visual Instructions (VI) [7, 13]. These methods align image tokens produced by visual

Svit: Scaling up visual instruction tuning

B Zhao, B Wu, M He, T Huang - arXiv preprint arXiv:2307.04087, 2023 - arxiv.org
… In this paper, we scale up visual instruction tuning by presenting a large-scale dataset – SVIT
that contains in total 4.2 million instruction tuning data. We also propose new data recipe of …

Llavar: Enhanced visual instruction tuning for text-rich image understanding

Y Zhang, R Zhang, J Gu, Y Zhou, N Lipka… - arXiv preprint arXiv …, 2023 - arxiv.org
… fully leveraging the encoding capability of visual encoders. To this end, we propose to enhance
the visual instruction-tuned model end-to-end by collecting instruction-following data that …

Generative Visual Instruction Tuning

J Hernandez, R Villegas, V Ordonez - arXiv preprint arXiv:2406.11262, 2024 - arxiv.org
… This contrasts with the original visual instruction tuning in which the models retained their …
for visual understanding. In this paper, we present the generative visual instruction tunining, in …

Vigc: Visual instruction generation and correction

B Wang, F Wu, X Han, J Peng, H Zhong… - Proceedings of the …, 2024 - ojs.aaai.org
… We trained the VIGC network using two types of visual-language instruction fine-tuning data.
The first type, represented by the LLaVA dataset (Liu et al. 2023b), is manually curated and …

MAVIS: Mathematical Visual Instruction Tuning

R Zhang, X Wei, D Jiang, Y Zhang, Z Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
… Therefore, there is a pressing need for the development of more robust encoders for
mathematical images and the tuning of MLLMs with mathematical visual instructions, for which we …

Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com
instructions, which further enhances the robustness and flexibility of Osprey’s response. By
taking advantage of visual instruction tuning… the pixel-level instruction tuning capability for fine…

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org
visual instruction tuning approach that encompasses three stages: Stage 1 focuses on
improving the instruction-following ability of the model by tuninginstructions, LLaVA instructions, …