AGIBench: A Multi-granularity, Multimodal, Human-Referenced, Auto-Scoring Benchmark for Large...

文章

学术资源搜索

获得 2 条结果（用时0.02秒）

我的图书馆

AGIBench: A Multi-granularity, Multimodal, Human-Referenced, Auto-Scoring Benchmark for Large...

在引用文章中搜索

[PDF] arxiv.org

How many unicorns are in this image? a safety evaluation benchmark for vision llms

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org

This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …

被引用次数：29 相关文章所有 2 个版本

Inherent limitations of LLMs regarding spatial information

H Yan, X Hu, X Wan, C Huang, K Zou, S Xu - arXiv preprint arXiv …, 2023 - arxiv.org

Despite the significant advancements in natural language processing capabilities
demonstrated by large language models such as ChatGPT, their proficiency in …

高级搜索

QQ 群

AGIBench: A Multi-granularity, Multimodal, Human-Referenced, Auto-Scoring Benchmark for Large...

How many unicorns are in this image? a safety evaluation benchmark for vision llms

Inherent limitations of LLMs regarding spatial information

引用