查看文章

arxiv.org 中的 [PDF]

Assessing gpt4-v on structured reasoning tasks

作者

Mukul Singh, José Cambronero, Sumit Gulwani, Vu Le, Gust Verbruggen

发表日期

2023/12/13

期刊

arXiv preprint arXiv:2312.11524

简介

Multi-modality promises to unlock further uses for large language models. Recently, the state-of-the-art language model GPT-4 was enhanced with vision capabilities. We carry out a prompting evaluation of GPT-4V and five other baselines on structured reasoning tasks, such as mathematical reasoning, visual data analysis, and code generation. We show that visual Chain-of-Thought, an extension of Chain-of-Thought to multi-modal LLMs, yields significant improvements over the vanilla model. We also present a categorized analysis of scenarios where these models perform well and where they struggle, highlighting challenges associated with coherent multimodal reasoning.

引用总数

被引用次数：5

20245

学术搜索中的文章

Assessing gpt4-v on structured reasoning tasks

M Singh, J Cambronero, S Gulwani, V Le… - arXiv preprint arXiv:2312.11524, 2023

被引用次数：5 相关文章所有 2 个版本