Blip-2: Bootstrapping language-image pre-training with frozen image encoders and large language...

A comprehensive survey of ai-generated content (aigc): A history of generative ai from gan to chatgpt

Y Cao, S Li, Y Liu, Z Yan, Y Dai, PS Yu… - arXiv preprint arXiv …, 2023 - arxiv.org

Recently, ChatGPT, along with DALL-E-2 and Codex, has been gaining significant attention
from society. As a result, many individuals have become interested in related resources and …

被引用次数：442 相关文章所有 2 个版本

[PDF] nature.com

ChatGPT for shaping the future of dentistry: the potential of multi-modal large language model

H Huang, O Zheng, D Wang, J Yin, Z Wang… - International Journal of …, 2023 - nature.com

The ChatGPT, a lite and conversational variant of Generative Pretrained Transformer 4 (GPT-
4) developed by OpenAI, is one of the milestone Large Language Models (LLMs) with …

被引用次数：94 相关文章所有 11 个版本

[PDF] neurips.cc

Visual instruction tuning

H Liu, C Li, Q Wu, YJ Lee - Advances in neural information …, 2024 - proceedings.neurips.cc

Instruction tuning large language models (LLMs) using machine-generated instruction-
following data has been shown to improve zero-shot capabilities on new tasks, but the idea …

被引用次数：1742 相关文章所有 10 个版本

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

被引用次数：529 相关文章所有 4 个版本

[PDF] neurips.cc

Instructblip: Towards general-purpose vision-language models with instruction tuning

W Dai, J Li, D Li, AMH Tiong, J Zhao… - Advances in …, 2024 - proceedings.neurips.cc

Large-scale pre-training and instruction tuning have been successful at creating general-
purpose language models with broad competence. However, building general-purpose …

被引用次数：675 相关文章所有 6 个版本

[PDF] arxiv.org

mplug-owl: Modularization empowers large language models with multimodality

Q Ye, H Xu, G Xu, J Ye, M Yan, Y Zhou, J Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models (LLMs) have demonstrated impressive zero-shot abilities on a
variety of open-ended tasks, while recent research has also explored the use of LLMs for …

被引用次数：471 相关文章所有 3 个版本

[PDF] neurips.cc

Hugginggpt: Solving ai tasks with chatgpt and its friends in hugging face

Y Shen, K Song, X Tan, D Li, W Lu… - Advances in Neural …, 2024 - proceedings.neurips.cc

Solving complicated AI tasks with different domains and modalities is a key step toward
artificial general intelligence. While there are numerous AI models available for various …

被引用次数：649 相关文章所有 6 个版本

[PDF] arxiv.org

Llama-adapter: Efficient fine-tuning of language models with zero-init attention

R Zhang, J Han, C Liu, P Gao, A Zhou, X Hu… - arXiv preprint arXiv …, 2023 - arxiv.org

We present LLaMA-Adapter, a lightweight adaption method to efficiently fine-tune LLaMA
into an instruction-following model. Using 52K self-instruct demonstrations, LLaMA-Adapter …

被引用次数：417 相关文章所有 4 个版本

[PDF] arxiv.org

Minigpt-4: Enhancing vision-language understanding with advanced large language models

D Zhu, J Chen, X Shen, X Li, M Elhoseiny - arXiv preprint arXiv …, 2023 - arxiv.org

The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …

被引用次数：1218 相关文章所有 7 个版本

[PDF] thecvf.com

Lisa: Reasoning segmentation via large language model

X Lai, Z Tian, Y Chen, Y Li, Y Yuan… - Proceedings of the …, 2024 - openaccess.thecvf.com

Although perception systems have made remarkable advancements in recent years they still
rely on explicit human instruction or pre-defined categories to identify the target objects …

被引用次数：148 相关文章所有 2 个版本

高级搜索

QQ 群