相关文章- 学术资源搜索

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

被引用次数：1788 相关文章所有 10 个版本

[PDF] arxiv.org

Mimic-it: Multi-modal in-context instruction tuning

B Li, Y Zhang, L Chen, J Wang, F Pu, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org

High-quality instructions and responses are essential for the zero-shot performance of large
language models on interactive natural language tasks. For interactive vision-language …

被引用次数：400 相关文章所有 4 个版本

[PDF] neurips.cc

Instructblip: Towards general-purpose vision-language models with instruction tuning

W Dai, J Li, D Li, AMH Tiong, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc

Large-scale pre-training and instruction tuning have been successful at creating general-
purpose language models with broad competence. However, building general-purpose …

被引用次数：695 相关文章所有 6 个版本

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

被引用次数：38 相关文章所有 3 个版本

[PDF] arxiv.org

Mixture of cluster-conditional lora experts for vision-language instruction tuning

Y Gou, Z Liu, K Chen, L Hong, H Xu, A Li… - arXiv preprint arXiv …, 2023 - arxiv.org

Instruction tuning of the Large Vision-language Models (LVLMs) has revolutionized the
development of versatile models with zero-shot generalization across a wide range of …

被引用次数：21 相关文章所有 2 个版本

[PDF] arxiv.org

Llama-adapter v2: Parameter-efficient visual instruction model

P Gao, J Han, R Zhang, Z Lin, S Geng, A Zhou… - arXiv preprint arXiv …, 2023 - arxiv.org

How to efficiently transform large language models (LLMs) into instruction followers is
recently a popular research direction, while training LLM for multi-modal reasoning remains …

被引用次数：337 相关文章所有 4 个版本

[PDF] arxiv.org

Multiinstruct: Improving multi-modal zero-shot learning via instruction tuning

Z Xu, Y Shen, L Huang - arXiv preprint arXiv:2212.10773, 2022 - arxiv.org

Instruction tuning, a new learning paradigm that fine-tunes pre-trained language models on
tasks specified through instructions, has shown promising zero-shot performance on various …

被引用次数：73 相关文章所有 5 个版本

[PDF] arxiv.org

Visit-bench: A benchmark for vision-language instruction following inspired by real-world use

Y Bitton, H Bansal, J Hessel, R Shao, W Zhu… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce VisIT-Bench (Visual InsTruction Benchmark), a benchmark for evaluation of
instruction-following vision-language models for real-world use. Our starting point is curating …

被引用次数：29 相关文章所有 4 个版本

[PDF] arxiv.org

Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4

L Wei, Z Jiang, W Huang, L Sun - arXiv preprint arXiv:2308.12067, 2023 - arxiv.org

Multimodal large language models acquire their instruction-following capabilities through a
two-stage training process: pre-training on image-text pairs and fine-tuning on supervised …

被引用次数：23 相关文章所有 3 个版本

[PDF] thecvf.com

From images to textual prompts: Zero-shot visual question answering with frozen large language models

J Guo, J Li, D Li, AMH Tiong, B Li… - Proceedings of the …, 2023 - openaccess.thecvf.com

Large language models (LLMs) have demonstrated excellent zero-shot generalization to
new language tasks. However, effective utilization of LLMs for zero-shot visual question …

被引用次数：69 相关文章所有 5 个版本

高级搜索

QQ 群

Visual instruction tuning

Mimic-it: Multi-modal in-context instruction tuning

Instructblip: Towards general-purpose vision-language models with instruction tuning

Vila: On pre-training for visual language models

Mixture of cluster-conditional lora experts for vision-language instruction tuning

Llama-adapter v2: Parameter-efficient visual instruction model

Multiinstruct: Improving multi-modal zero-shot learning via instruction tuning

Visit-bench: A benchmark for vision-language instruction following inspired by real-world use

Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4

From images to textual prompts: Zero-shot visual question answering with frozen large language models

相关搜索

引用