Sora: A review on background, technology, limitations, and opportunities of large vision models

Y Liu, K Zhang, Y Li, Z Yan, C Gao, R Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Sora is a text-to-video generative AI model, released by OpenAI in February 2024. The
model is trained to generate videos of realistic or imaginative scenes from text instructions …

Knowledge graphs meet multi-modal learning: A comprehensive survey

Z Chen, Y Zhang, Y Fang, Y Geng, L Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Knowledge Graphs (KGs) play a pivotal role in advancing various AI applications, with the
semantic web community's exploration into multi-modal dimensions unlocking new avenues …

Robustness of Structured Data Extraction from In-Plane Rotated Documents Using Multi-Modal Large Language Models (LLM)

A Biswas, W Talukdar - Journal of Artificial Intelligence Research, 2024 - arxiv.org
Multi-modal large language models (LLMs) have shown remarkable performance in various
natural language processing tasks, including data extraction from documents. However, the …

Hallucination of multimodal large language models: A survey

Z Bai, P Wang, T Xiao, T He, Z Han, Z Zhang… - arXiv preprint arXiv …, 2024 - arxiv.org
This survey presents a comprehensive analysis of the phenomenon of hallucination in
multimodal large language models (MLLMs), also known as Large Vision-Language Models …

Rlaif-v: Aligning mllms through open-source ai feedback for super gpt-4v trustworthiness

T Yu, H Zhang, Y Yao, Y Dang, D Chen, X Lu… - arXiv preprint arXiv …, 2024 - arxiv.org
Learning from feedback reduces the hallucination of multimodal large language models
(MLLMs) by aligning them with human preferences. While traditional methods rely on labor …

Visual hallucinations of multi-modal large language models

W Huang, H Liu, M Guo, NZ Gong - arXiv preprint arXiv:2402.14683, 2024 - arxiv.org
Visual hallucination (VH) means that a multi-modal LLM (MLLM) imagines incorrect details
about an image in visual question answering. Existing studies find VH instances only in …

Less is more: Mitigating multimodal hallucination from an eos decision perspective

Z Yue, L Zhang, Q Jin - arXiv preprint arXiv:2402.14545, 2024 - arxiv.org
Large Multimodal Models (LMMs) often suffer from multimodal hallucinations, wherein they
may create content that is not present in the visual inputs. In this paper, we explore a new …

What if...?: Counterfactual inception to mitigate hallucination effects in large multimodal models

J Kim, YJ Kim, YM Ro - arXiv preprint arXiv:2403.13513, 2024 - arxiv.org
This paper presents a way of enhancing the reliability of Large Multimodal Models (LMMs) in
addressing hallucination effects, where models generate incorrect or unrelated responses …

LightHouse: A Survey of AGI Hallucination

F Wang - arXiv preprint arXiv:2401.06792, 2024 - arxiv.org
With the development of artificial intelligence, large-scale models have become increasingly
intelligent. However, numerous studies indicate that hallucinations within these large …

Seeing Clearly, Answering Incorrectly: A Multimodal Robustness Benchmark for Evaluating MLLMs on Leading Questions

Y Liu, Z Liang, Y Wang, M He, J Li, B Zhao - arXiv preprint arXiv …, 2024 - arxiv.org
Multimodal Large Language Models (MLLMs) have exhibited impressive capabilities in
visual understanding and reasoning, providing sightly reasonable answers, such as image …