Instruction tuning for large language models: A survey

S Zhang, L Dong, X Li, S Zhang, X Sun, S Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper surveys research works in the quickly advancing field of instruction tuning (IT), a
crucial technique to enhance the capabilities and controllability of large language models …

A comprehensive overview of large language models

H Naveed, AU Khan, S Qiu, M Saqib, S Anwar… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLMs) have recently demonstrated remarkable capabilities in
natural language processing tasks and beyond. This success of LLMs has led to a large …

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org
Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

Minigpt-4: Enhancing vision-language understanding with advanced large language models

D Zhu, J Chen, X Shen, X Li, M Elhoseiny - arXiv preprint arXiv …, 2023 - arxiv.org
The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Language is not all you need: Aligning perception with language models

S Huang, L Dong, W Wang, Y Hao… - Advances in …, 2024 - proceedings.neurips.cc
A big convergence of language, multimodal perception, action, and world modeling is a key
step toward artificial general intelligence. In this work, we introduce KOSMOS-1, a …

Alpacafarm: A simulation framework for methods that learn from human feedback

Y Dubois, CX Li, R Taori, T Zhang… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) such as ChatGPT have seen widespread adoption due to
their ability to follow user instructions well. Developing these LLMs involves a complex yet …

Visionllm: Large language model is also an open-ended decoder for vision-centric tasks

W Wang, Z Chen, X Chen, J Wu… - Advances in …, 2024 - proceedings.neurips.cc
Large language models (LLMs) have notably accelerated progress towards artificial general
intelligence (AGI), with their impressive zero-shot capacity for user-tailored tasks, endowing …

Mimic-it: Multi-modal in-context instruction tuning

B Li, Y Zhang, L Chen, J Wang, F Pu, J Yang… - arXiv preprint arXiv …, 2023 - arxiv.org
High-quality instructions and responses are essential for the zero-shot performance of large
language models on interactive natural language tasks. For interactive vision-language …

mplug-owl: Modularization empowers large language models with multimodality

Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) have demonstrated impressive zero-shot abilities on a
variety of open-ended tasks, while recent research has also explored the use of LLMs for …