Mmbench: Is your multi-modal model an all-around player?

Y Chang, X Wang, J Wang, Y Wu, L Yang… - ACM Transactions on …, 2024 - dl.acm.org

Large language models (LLMs) are gaining increasing popularity in both academia and
industry, owing to their unprecedented performance in various applications. As LLMs …

被引用次数：928 相关文章所有 4 个版本

[PDF] arxiv.org

Mm-llms: Recent advances in multimodal large language models

D Zhang, Y Yu, C Li, J Dong, D Su, C Chu… - arXiv preprint arXiv …, 2024 - arxiv.org

In the past year, MultiModal Large Language Models (MM-LLMs) have undergone
substantial advancements, augmenting off-the-shelf LLMs to support MM inputs or outputs …

被引用次数：54 相关文章所有 2 个版本

[PDF] arxiv.org

A survey of large language models

WX Zhao, K Zhou, J Li, T Tang, X Wang, Y Hou… - arXiv preprint arXiv …, 2023 - arxiv.org

Language is essentially a complex, intricate system of human expressions governed by
grammatical rules. It poses a significant challenge to develop capable AI algorithms for …

被引用次数：1974 相关文章所有 4 个版本

[PDF] arxiv.org

Minigpt-4: Enhancing vision-language understanding with advanced large language models

D Zhu, J Chen, X Shen, X Li, M Elhoseiny - arXiv preprint arXiv …, 2023 - arxiv.org

The recent GPT-4 has demonstrated extraordinary multi-modal abilities, such as directly
generating websites from handwritten text and identifying humorous elements within …

被引用次数：1396 相关文章所有 7 个版本

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

被引用次数：707 相关文章所有 5 个版本

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

被引用次数：164 相关文章所有 3 个版本

[PDF] thecvf.com

mplug-owl2: Revolutionizing multi-modal large language model with modality collaboration

Q Ye, H Xu, J Ye, M Yan, A Hu, H Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract Multi-modal Large Language Models (MLLMs) have demonstrated impressive
instruction abilities across various open-ended tasks. However previous methods have …

被引用次数：129 相关文章所有 4 个版本

[PDF] arxiv.org

Next-gpt: Any-to-any multimodal llm

S Wu, H Fei, L Qu, W Ji, TS Chua - arXiv preprint arXiv:2309.05519, 2023 - arxiv.org

While recently Multimodal Large Language Models (MM-LLMs) have made exciting strides,
they mostly fall prey to the limitation of only input-side multimodal understanding, without the …

被引用次数：223 相关文章所有 4 个版本

[PDF] arxiv.org

Mm-vet: Evaluating large multimodal models for integrated capabilities

W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose MM-Vet, an evaluation benchmark that examines large multimodal models
(LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing …

被引用次数：210 相关文章所有 3 个版本

[PDF] arxiv.org

Cogvlm: Visual expert for pretrained language models

W Wang, Q Lv, W Yu, W Hong, J Qi, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

We introduce CogVLM, a powerful open-source visual language foundation model. Different
from the popular shallow alignment method which maps image features into the input space …

被引用次数：228 相关文章所有 3 个版本

高级搜索

QQ 群