- 学术资源搜索

Mmc: Advancing multimodal chart understanding with large-scale instruction tuning

F Liu, X Wang, W Yao, J Chen, K Song, S Cho… - arXiv preprint arXiv …, 2023 - arxiv.org

With the rapid development of large language models (LLMs) and their integration into large
multimodal models (LMMs), there has been impressive progress in zero-shot completion of …

被引用次数：43 相关文章所有 3 个版本

[PDF] arxiv.org

A survey on visual anomaly detection: Challenge, approach, and prospect

Y Cao, X Xu, J Zhang, Y Cheng, X Huang… - arXiv preprint arXiv …, 2024 - arxiv.org

Visual Anomaly Detection (VAD) endeavors to pinpoint deviations from the concept of
normality in visual data, widely applied across diverse domains, eg, industrial defect …

被引用次数：17 相关文章所有 2 个版本

[PDF] arxiv.org

Large Language Models in Biomedical and Health Informatics: A Bibliometric Review

H Yu, L Fan, L Li, J Zhou, Z Ma, L Xian, W Hua… - arXiv preprint arXiv …, 2024 - arxiv.org

Large Language Models (LLMs) have rapidly become important tools in Biomedical and
Health Informatics (BHI), enabling new ways to analyze data, treat patients, and conduct …

被引用次数：2 相关文章所有 2 个版本

[PDF] arxiv.org

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

W Wu, H Yao, M Zhang, Y Song, W Ouyang… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper does not present a novel method. Instead, it delves into an essential, yet must-
know baseline in light of the latest advancements in Generative Artificial Intelligence …

被引用次数：15 相关文章所有 2 个版本

[PDF] arxiv.org

Gpt-4v (ision) as a social media analysis engine

H Lyu, J Huang, D Zhang, Y Yu, X Mou, J Pan… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …

被引用次数：18 相关文章所有 2 个版本

[PDF] arxiv.org

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org

State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

被引用次数：6 相关文章所有 2 个版本

[PDF] arxiv.org

How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation

Z Han, G Zhou, R He, J Wang, X Xie, T Wu… - arXiv preprint arXiv …, 2023 - arxiv.org

In machine learning, generalization against distribution shifts--where deployment conditions
diverge from the training scenarios--is crucial, particularly in fields like climate modeling …

被引用次数：7 相关文章所有 4 个版本

[PDF] arxiv.org

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

L Fan, W Hua, X Li, K Zhu, M Jin, L Li, H Ling… - arXiv preprint arXiv …, 2024 - arxiv.org

Understanding the reasoning capabilities of Multimodal Large Language Models (MLLMs) is
an important area of research. In this study, we introduce a dynamic benchmark …

被引用次数：4 相关文章所有 2 个版本

[PDF] thecvf.com

Long-Tailed Anomaly Detection with Learnable Class Names

CH Ho, KC Peng… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Anomaly detection (AD) aims to identify defective images and localize their defects (if any).
Ideally AD models should be able to detect defects over many image classes; without relying …

被引用次数：1 相关文章所有 11 个版本

[PDF] arxiv.org

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

H Wang, J Qin, A Bastola, X Chen, J Suchanek… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper explores the potential of Large Language Models (LLMs) in zero-shot anomaly
detection for safe visual navigation. With the assistance of the state-of-the-art real-time open …

被引用次数：2 相关文章所有 2 个版本

高级搜索

QQ 群

Mmc: Advancing multimodal chart understanding with large-scale instruction tuning

A survey on visual anomaly detection: Challenge, approach, and prospect

Large Language Models in Biomedical and Health Informatics: A Bibliometric Review

GPT4Vis: what can GPT-4 do for zero-shot visual recognition?

Gpt-4v (ision) as a social media analysis engine

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

How well does gpt-4v (ision) adapt to distribution shifts? a preliminary investigation

NPHardEval4V: A Dynamic Reasoning Benchmark of Multimodal Large Language Models

Long-Tailed Anomaly Detection with Learnable Class Names

VisionGPT: LLM-Assisted Real-Time Anomaly Detection for Safe Visual Navigation

引用