The dawn of lmms: Preliminary explorations with gpt-4v (ision)

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org

Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

被引用次数：48 相关文章所有 4 个版本

[PDF] thecvf.com

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com

Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

被引用次数：623 相关文章所有 5 个版本

[PDF] thecvf.com

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

被引用次数：140 相关文章所有 3 个版本

[PDF] arxiv.org

Large language models and games: A survey and roadmap

R Gallotta, G Todd, M Zammit, S Earle, A Liapis… - arXiv preprint arXiv …, 2024 - arxiv.org

Recent years have seen an explosive increase in research on large language models
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Mm-vet: Evaluating large multimodal models for integrated capabilities

W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

We propose MM-Vet, an evaluation benchmark that examines large multimodal models
(LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing …

被引用次数：190 相关文章所有 3 个版本

[PDF] thecvf.com

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com

Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

被引用次数：54 相关文章所有 4 个版本

[PDF] thecvf.com

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com

Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

被引用次数：49 相关文章所有 4 个版本

[PDF] arxiv.org

Generalized out-of-distribution detection: A survey

J Yang, K Zhou, Y Li, Z Liu - International Journal of Computer Vision, 2024 - Springer

Abstract Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of
machine learning systems. For instance, in autonomous driving, we would like the driving …

被引用次数：701 相关文章所有 4 个版本

[PDF] thecvf.com

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com

The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

被引用次数：34 相关文章所有 4 个版本

[PDF] arxiv.org

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

P Lu, H Bansal, T Xia, J Liu, C Li, H Hajishirzi… - arXiv preprint arXiv …, 2023 - arxiv.org

Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …

被引用次数：111 相关文章所有 3 个版本

高级搜索

QQ 群