Visual instruction tuning- 学术资源搜索

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc

… multimodal language-image instruction-following data. By instruction tuning on such generated
… for general-purpose visual and language understanding. To facilitate future research on …

被引用次数：2735 相关文章所有 15 个版本

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

… rethinking the conventional approaches and exploring the open problems in visual instruction
tuning, we pave the way for more robust and capable systems for LMMs. We hope these …

被引用次数：843 相关文章所有 5 个版本

[PDF] arxiv.org

Comparison Visual Instruction Tuning

W Lin, MJ Mirza, S Doveh, R Feris, R Giryes… - arXiv preprint arXiv …, 2024 - arxiv.org

… the best available mimic of human visual intelligence to date. While multiple methods … tuning
using Visual Instructions (VI) [7, 13]. These methods align image tokens produced by visual …

Svit: Scaling up visual instruction tuning

B Zhao, B Wu, M He, T Huang - arXiv preprint arXiv:2307.04087, 2023 - arxiv.org

… In this paper, we scale up visual instruction tuning by presenting a large-scale dataset – SVIT
that contains in total 4.2 million instruction tuning data. We also propose new data recipe of …

被引用次数：71 相关文章所有 2 个版本

相关搜索

[PDF] arxiv.org

Llavar: Enhanced visual instruction tuning for text-rich image understanding

Y Zhang, R Zhang, J Gu, Y Zhou, N Lipka… - arXiv preprint arXiv …, 2023 - arxiv.org

… fully leveraging the encoding capability of visual encoders. To this end, we propose to enhance
the visual instruction-tuned model end-to-end by collecting instruction-following data that …

被引用次数：118 相关文章所有 2 个版本

[PDF] arxiv.org

Generative Visual Instruction Tuning

J Hernandez, R Villegas, V Ordonez - arXiv preprint arXiv:2406.11262, 2024 - arxiv.org

… This contrasts with the original visual instruction tuning in which the models retained their …
for visual understanding. In this paper, we present the generative visual instruction tunining, in …

被引用次数：1 相关文章

[PDF] aaai.org

Vigc: Visual instruction generation and correction

B Wang, F Wu, X Han, J Peng, H Zhong… - Proceedings of the …, 2024 - ojs.aaai.org

… We trained the VIGC network using two types of visual-language instruction fine-tuning data.
The first type, represented by the LLaVA dataset (Liu et al. 2023b), is manually curated and …

被引用次数：37 相关文章所有 4 个版本

[PDF] arxiv.org

MAVIS: Mathematical Visual Instruction Tuning

R Zhang, X Wei, D Jiang, Y Zhang, Z Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

… Therefore, there is a pressing need for the development of more robust encoders for
mathematical images and the tuning of MLLMs with mathematical visual instructions, for which we …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

Osprey: Pixel understanding with visual instruction tuning

Y Yuan, W Li, J Liu, D Tang, X Luo… - Proceedings of the …, 2024 - openaccess.thecvf.com

… instructions, which further enhances the robustness and flexibility of Osprey’s response. By
taking advantage of visual instruction tuning… the pixel-level instruction tuning capability for fine…

被引用次数：25 相关文章所有 4 个版本

[PDF] aaai.org

Visual instruction tuning with polite flamingo

D Chen, J Liu, W Dai, B Wang - … of the AAAI Conference on Artificial …, 2024 - ojs.aaai.org

… visual instruction tuning approach that encompasses three stages: Stage 1 focuses on
improving the instruction-following ability of the model by tuning … instructions, LLaVA instructions, …

被引用次数：22 相关文章所有 3 个版本

高级搜索

QQ 群