How many unicorns are in this image? a safety evaluation benchmark for vision llms

H Tu, C Cui, Z Wang, Y Zhou, B Zhao, J Han… - arXiv preprint arXiv …, 2023 - arxiv.org
This work focuses on the potential of Vision LLMs (VLLMs) in visual reasoning. Different
from prior studies, we shift our focus from evaluating standard performance to introducing a …

Inherent limitations of LLMs regarding spatial information

H Yan, X Hu, X Wan, C Huang, K Zou, S Xu - arXiv preprint arXiv …, 2023 - arxiv.org
Despite the significant advancements in natural language processing capabilities
demonstrated by large language models such as ChatGPT, their proficiency in …