How Far Are We from Intelligent Visual Deductive Reasoning?

文章

学术资源搜索

获得 4 条结果（用时0.02秒）

我的图书馆

How Far Are We from Intelligent Visual Deductive Reasoning?

在引用文章中搜索

[PDF] arxiv.org

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Y Qiao, H Duan, X Fang, J Yang, L Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision Language Models (VLMs) demonstrate remarkable proficiency in addressing a wide
array of visual questions, which requires strong perception and reasoning faculties …

被引用次数：1 相关文章

[PDF] acm.org

Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

NC Mendonça - ACM Transactions on Computing Education, 2024 - dl.acm.org

The recent integration of visual capabilities into Large Language Models (LLMs) has the
potential to play a pivotal role in science and technology education, where visual elements …

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

X Cao, B Lai, W Ye, Y Ma, J Heintz, J Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently, Multimodal Large Language Models (MLLMs) have shown great promise in
language-guided perceptual tasks such as recognition, segmentation, and object detection …

[PDF] arxiv.org

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

X Zou, Y Chen - arXiv preprint arXiv:2407.02534, 2024 - arxiv.org

Large Visual Language Models (VLMs) such as GPT-4 have achieved remarkable success
in generating comprehensive and nuanced responses, surpassing the capabilities of large …

高级搜索

QQ 群

How Far Are We from Intelligent Visual Deductive Reasoning?

Prism: A Framework for Decoupling and Assessing the Capabilities of VLMs

Evaluating ChatGPT-4 Vision on Brazil's National Undergraduate Computer Science Exam

What is the Visual Cognition Gap between Humans and Multimodal LLMs?

Image-to-Text Logic Jailbreak: Your Imagination can Help You Do Anything

引用