Mmc: Advancing multimodal chart understanding with large-scale instruction tuning

F Liu, X Wang, W Yao, J Chen, K Song, S Cho… - arXiv preprint arXiv …, 2023 - arxiv.org
With the rapid development of large language models (LLMs) and their integration into large
multimodal models (LMMs), there has been impressive progress in zero-shot completion of …

A survey on visual anomaly detection: Challenge, approach, and prospect

Y Cao, X Xu, J Zhang, Y Cheng, X Huang… - arXiv preprint arXiv …, 2024 - arxiv.org
Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of
normality in visual data, widely applied across diverse domains, eg, industrial defect …

Large Language Models in Biomedical and Health Informatics: A Bibliometric Review

H Yu, L Fan, L Li, J Zhou, Z Ma, L Xian, W Hua… - arXiv preprint arXiv …, 2024 - arxiv.org
Large Language Models (LLMs) have rapidly become important tools in Biomedical and
Health Informatics (BHI), enabling new ways to analyze data, treat patients, and conduct …

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

Gpt-4v (ision) as a social media analysis engine

H Lyu, J Huang, D Zhang, Y Yu, X Mou, J Pan… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org
State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation

Z Han, G Zhou, R He, J Wang, X Xie, T Wu… - arXiv preprint arXiv …, 2023 - arxiv.org
In machine learning, generalization against distribution shifts--where deployment conditions
diverge from the training scenarios--is crucial, particularly in fields like climate modeling …

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

L Fan, W Hua, X Li, K Zhu, M Jin, L Li, H Ling… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is
an important area of research. In this study, we introduce a dynamic benchmark …

Long-Tailed Anomaly Detection with Learnable Class Names

CH Ho, KC Peng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Anomaly detection (AD) aims to identify defective images and localize their defects (if any).
Ideally AD models should be able to detect defects over many image classes; without relying …

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

H Wang, J Qin, A Bastola, X Chen, J Suchanek… - arXiv preprint arXiv …, 2024 - arxiv.org
This paper explores the potential of Large Language Models (LLMs) in zero-shot anomaly
detection for safe visual navigation. With the assistance of the state-of-the-art real-time open …