An early evaluation of gpt-4v (ision)

D Chen, R Chen, S Zhang, Y Liu, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Multimodal Large Language Models (MLLMs) have gained significant attention recently,
showing remarkable potential in artificial general intelligence. However, assessing the utility …

被引用次数：38 相关文章所有 3 个版本

[PDF] arxiv.org

Holistic analysis of hallucination in gpt-4v (ision): Bias and interference challenges

C Cui, Y Zhou, X Yang, S Wu, L Zhang, J Zou… - arXiv preprint arXiv …, 2023 - arxiv.org

While GPT-4V (ision) impressively models both visual and textual information
simultaneously, it's hallucination behavior has not been systematically assessed. To bridge …

被引用次数：84 相关文章所有 2 个版本

[PDF] arxiv.org

An llm-free multi-dimensional benchmark for mllms hallucination evaluation

J Wang, Y Wang, G Xu, J Zhang, Y Gu, H Jia… - arXiv preprint arXiv …, 2023 - arxiv.org

Despite making significant progress in multi-modal tasks, current Multi-modal Large
Language Models (MLLMs) encounter the significant challenge of hallucination, which may …

被引用次数：76 相关文章所有 2 个版本

[PDF] oup.com

Progress and opportunities of foundation models in bioinformatics

Q Li, Z Hu, Y Wang, L Li, Y Fan, I King… - Briefings in …, 2024 - academic.oup.com

Bioinformatics has undergone a paradigm shift in artificial intelligence (AI), particularly
through foundation models (FMs), which address longstanding challenges in bioinformatics …

被引用次数：1 相关文章所有 2 个版本

[PDF] thecvf.com

Charting new territories: Exploring the geographic and geospatial capabilities of multimodal llms

J Roberts, T Lüddecke, R Sheikh… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) have shown remarkable capabilities across a
broad range of tasks but their knowledge and abilities in the geographic and geospatial …

被引用次数：22 相关文章所有 3 个版本

[PDF] arxiv.org

Gpt-4v (ision) as a social media analysis engine

H Lyu, J Huang, D Zhang, Y Yu, X Mou, J Pan… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent research has offered insights into the extraordinary capabilities of Large Multimodal
Models (LMMs) in various general vision and language tasks. There is growing interest in …

被引用次数：31 相关文章所有 2 个版本

[PDF] thecvf.com

GlitchBench: Can large multimodal models detect video game glitches?

MR Taesiri, T Feng, CP Bezemer… - Proceedings of the …, 2024 - openaccess.thecvf.com

Large multimodal models (LMMs) have evolved from large language models (LLMs) to
integrate multiple input modalities such as visual inputs. This integration augments the …

被引用次数：12 相关文章所有 3 个版本

[PDF] arxiv.org

Nerif: Gpt-4v for automatic scoring of drawn models

GG Lee, X Zhai - arXiv preprint arXiv:2311.12990, 2023 - arxiv.org

Scoring student-drawn models is time-consuming. Recently released GPT-4V provides a
unique opportunity to advance scientific modeling practices by leveraging the powerful …

被引用次数：15 相关文章所有 3 个版本

[HTML] sciencedirect.com

[HTML][HTML] A systematic evaluation of GPT-4V's multimodal capability for chest X-ray image analysis

Y Liu, Y Li, Z Wang, X Liang, L Liu, L Wang, L Cui, Z Tu… - Meta-Radiology, 2024 - Elsevier

This work evaluates GPT-4V's multimodal capability for medical image analysis, focusing on
three representative tasks radiology report generation, medical visual question answering …

被引用次数：5 相关文章

[PDF] arxiv.org

Scaffolding coordinates to promote vision-language coordination in large multi-modal models

X Lei, Z Yang, X Chen, P Li, Y Liu - arXiv preprint arXiv:2402.12058, 2024 - arxiv.org

State-of-the-art Large Multi-Modal Models (LMMs) have demonstrated exceptional
capabilities in vision-language tasks. Despite their advanced functionalities, the …

被引用次数：19 相关文章所有 2 个版本

高级搜索

QQ 群