相关文章- 学术资源搜索

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

P Lu, H Bansal, T Xia, J Liu, C Li, H Hajishirzi… - arXiv preprint arXiv …, 2023 - arxiv.org

Although Large Language Models (LLMs) and Large Multimodal Models (LMMs) exhibit
impressive skills in various domains, their ability for mathematical reasoning within visual …

被引用次数：111 相关文章所有 3 个版本

[PDF] arxiv.org

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

R Zhang, D Jiang, Y Zhang, H Lin, Z Guo, P Qiu… - arXiv preprint arXiv …, 2024 - arxiv.org

The remarkable progress of Multi-modal Large Language Models (MLLMs) has garnered
unparalleled attention, due to their superior performance in visual contexts. However, their …

被引用次数：20 相关文章所有 2 个版本

[PDF] arxiv.org

Agieval: A human-centric benchmark for evaluating foundation models

W Zhong, R Cui, Y Guo, Y Liang, S Lu, Y Wang… - arXiv preprint arXiv …, 2023 - arxiv.org

Evaluating the general abilities of foundation models to tackle human-level tasks is a vital
aspect of their development and application in the pursuit of Artificial General Intelligence …

被引用次数：196 相关文章所有 3 个版本

[PDF] arxiv.org

Foundation models for decision making: Problems, methods, and opportunities

S Yang, O Nachum, Y Du, J Wei, P Abbeel… - arXiv preprint arXiv …, 2023 - arxiv.org

Foundation models pretrained on diverse data at scale have demonstrated extraordinary
capabilities in a wide range of vision and language tasks. When such models are deployed …

被引用次数：93 相关文章所有 3 个版本

[PDF] aclanthology.org

Unimath: A foundational and multimodal mathematical reasoner

Z Liang, T Yang, J Zhang, X Zhang - Proceedings of the 2023 …, 2023 - aclanthology.org

While significant progress has been made in natural language processing (NLP), existing
methods exhibit limitations in effectively interpreting and processing diverse mathematical …

被引用次数：7 相关文章所有 2 个版本

[PDF] thetalkingmachines.com

[PDF][PDF] Gqa: a new dataset for compositional question answering over real-world images

DA Hudson, CD Manning - arXiv preprint arXiv …, 2019 - thetalkingmachines.com

We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

被引用次数：127 相关文章所有 2 个版本

[PDF] thecvf.com

Gqa: A new dataset for real-world visual reasoning and compositional question answering

DA Hudson, CD Manning - … of the IEEE/CVF conference on …, 2019 - openaccess.thecvf.com

We introduce GQA, a new dataset for real-world visual reasoning and compositional
question answering, seeking to address key shortcomings of previous VQA datasets. We …

被引用次数：1418 相关文章所有 8 个版本

[PDF] arxiv.org

Deplot: One-shot visual language reasoning by plot-to-table translation

F Liu, JM Eisenschlos, F Piccinno, S Krichene… - arXiv preprint arXiv …, 2022 - arxiv.org

Visual language such as charts and plots is ubiquitous in the human world. Comprehending
plots and charts requires strong reasoning skills. Prior state-of-the-art (SOTA) models …

被引用次数：51 相关文章所有 4 个版本

[PDF] arxiv.org

A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering

Y Li, L Wang, B Hu, X Chen, W Zhong, C Lyu… - arXiv preprint arXiv …, 2023 - arxiv.org

The emergence of multimodal large models (MLMs) has significantly advanced the field of
visual understanding, offering remarkable capabilities in the realm of visual question …

被引用次数：21 相关文章所有 3 个版本

[PDF] arxiv.org

Generating natural language explanations for visual question answering using scene graphs and visual attention

S Ghosh, G Burachas, A Ray, A Ziskind - arXiv preprint arXiv:1902.05715, 2019 - arxiv.org

In this paper, we present a novel approach for the task of eXplainable Question Answering
(XQA), ie, generating natural language (NL) explanations for the Visual Question Answering …

被引用次数：65 相关文章所有 7 个版本

高级搜索

QQ 群

Mathvista: Evaluating mathematical reasoning of foundation models in visual contexts

Mathverse: Does your multi-modal llm truly see the diagrams in visual math problems?

Agieval: A human-centric benchmark for evaluating foundation models

Foundation models for decision making: Problems, methods, and opportunities

Unimath: A foundational and multimodal mathematical reasoner

[PDF][PDF] Gqa: a new dataset for compositional question answering over real-world images

Gqa: A new dataset for real-world visual reasoning and compositional question answering

Deplot: One-shot visual language reasoning by plot-to-table translation

A comprehensive evaluation of gpt-4v on knowledge-intensive visual question answering

Generating natural language explanations for visual question answering using scene graphs and visual attention

相关搜索

引用