Combating misinformation in the age of llms: Opportunities and challenges

C Chen, K Shu - arXiv preprint arXiv:2311.05656, 2023 - arxiv.org
Misinformation such as fake news and rumors is a serious threat on information ecosystems
and public trust. The emergence of Large Language Models (LLMs) has great potential to …

Improved baselines with visual instruction tuning

H Liu, C Li, Y Li, YJ Lee - … of the IEEE/CVF Conference on …, 2024 - openaccess.thecvf.com
Large multimodal models (LMM) have recently shown encouraging progress with visual
instruction tuning. In this paper we present the first systematic study to investigate the design …

Mmmu: A massive multi-discipline multimodal understanding and reasoning benchmark for expert agi

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com
We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

Large language models and games: A survey and roadmap

R Gallotta, G Todd, M Zammit, S Earle, A Liapis… - arXiv preprint arXiv …, 2024 - arxiv.org
Recent years have seen an explosive increase in research on large language models
(LLMs), and accompanying public engagement on the topic. While starting as a niche area …

Mm-vet: Evaluating large multimodal models for integrated capabilities

W Yu, Z Yang, L Li, J Wang, K Lin, Z Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
We propose MM-Vet, an evaluation benchmark that examines large multimodal models
(LMMs) on complicated multimodal tasks. Recent LMMs have shown various intriguing …

Eyes wide shut? exploring the visual shortcomings of multimodal llms

S Tong, Z Liu, Y Zhai, Y Ma… - Proceedings of the …, 2024 - openaccess.thecvf.com
Is vision good enough for language? Recent advancements in multimodal models primarily
stem from the powerful reasoning abilities of large language models (LLMs). However the …

Vila: On pre-training for visual language models

J Lin, H Yin, W Ping, P Molchanov… - Proceedings of the …, 2024 - openaccess.thecvf.com
Visual language models (VLMs) rapidly progressed with the recent success of large
language models. There have been growing efforts on visual instruction tuning to extend the …

Generalized out-of-distribution detection: A survey

J Yang, K Zhou, Y Li, Z Liu - International Journal of Computer Vision, 2024 - Springer
Abstract Out-of-distribution (OOD) detection is critical to ensuring the reliability and safety of
machine learning systems. For instance, in autonomous driving, we would like the driving …

Internvl: Scaling up vision foundation models and aligning for generic visual-linguistic tasks

Z Chen, J Wu, W Wang, W Su, G Chen… - Proceedings of the …, 2024 - openaccess.thecvf.com
The exponential growth of large language models (LLMs) has opened up numerous
possibilities for multi-modal AGI systems. However the progress in vision and vision …

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

P Lu, H Bansal, T Xia, J Liu, C Li, H Hajishirzi… - arXiv preprint arXiv …, 2023 - arxiv.org
Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …