Visual instruction tuning with polite flamingo

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

被引用次数：707 相关文章所有 5 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：109 相关文章所有 6 个版本

[PDF] thecvf.com

Capsfusion: Rethinking image-text data at scale

Q Yu, Q Sun, X Zhang, Y Cui, F Zhang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large multimodal models demonstrate remarkable generalist ability to perform diverse
multimodal tasks in a zero-shot manner. Large-scale web-based image-text pairs contribute …

被引用次数：18 相关文章所有 3 个版本

[PDF] arxiv.org

Remoteclip: A vision language foundation model for remote sensing

F Liu, D Chen, Z Guan, X Zhou, J Zhu… - … on Geoscience and …, 2024 - ieeexplore.ieee.org

General-purpose foundation models have led to recent breakthroughs in artificial
intelligence (AI). In remote sensing, self-supervised learning (SSL) and masked image …

被引用次数：53 相关文章所有 3 个版本

[PDF] arxiv.org

The (r) evolution of multimodal large language models: A survey

D Caffagni, F Cocchi, L Barsellotti, N Moratelli… - arXiv preprint arXiv …, 2024 - arxiv.org

Connecting text and visual modalities plays an essential role in generative intelligence. For
this reason, inspired by the success of large language models, significant research efforts …

被引用次数：8 相关文章所有 4 个版本

[PDF] arxiv.org

From Instructions to Intrinsic Human Values--A Survey of Alignment Goals for Big Models

J Yao, X Yi, X Wang, J Wang, X Xie - arXiv preprint arXiv:2308.12014, 2023 - arxiv.org

Big models, exemplified by Large Language Models (LLMs), are models typically pre-
trained on massive data and comprised of enormous parameters, which not only obtain …

被引用次数：22 相关文章所有 2 个版本

[PDF] arxiv.org

Instructiongpt-4: A 200-instruction paradigm for fine-tuning minigpt-4

L Wei, Z Jiang, W Huang, L Sun - arXiv preprint arXiv:2308.12067, 2023 - arxiv.org

Multimodal large language models acquire their instruction-following capabilities through a
two-stage training process: pre-training on image-text pairs and fine-tuning on supervised …

被引用次数：26 相关文章所有 3 个版本

[PDF] openreview.net

Vision-language instruction tuning: A review and analysis

C Li, Y Ge, D Li, Y Shan - Transactions on Machine Learning …, 2023 - openreview.net

Instruction tuning is a crucial supervised training phase in Large Language Models (LLMs),
aiming to enhance the LLM's ability to generalize instruction execution and adapt to user …

被引用次数：4 相关文章所有 3 个版本

[PDF] openreview.net

Emu: Generative pretraining in multimodality

Q Sun, Q Yu, Y Cui, F Zhang, X Zhang… - The Twelfth …, 2023 - openreview.net

We present Emu, a multimodal foundation model that seamlessly generates images and text
in multimodal context. This omnivore model can take in any single-modality or multimodal …

被引用次数：13 相关文章

[PDF] arxiv.org

Sparkles: Unlocking chats across multiple images for multimodal instruction-following models

Y Huang, Z Meng, F Liu, Y Su, N Collier… - arXiv preprint arXiv …, 2023 - arxiv.org

Large language models exhibit enhanced zero-shot performance on various tasks when fine-
tuned with instruction-following data. Multimodal instruction-following models extend these …

被引用次数：10 相关文章所有 4 个版本

高级搜索

QQ 群