Vlmevalkit: An open-source toolkit for evaluating large multi-modality models

H Duan, J Yang, Y Qiao, X Fang, L Chen, Y Liu… - Proceedings of the …, 2024 - dl.acm.org
We present VLMEvalKit: an open-source toolkit for evaluating large multi-modality models
based on PyTorch. The toolkit aims to provide a user-friendly and comprehensive framework …

Vlsbench: Unveiling visual leakage in multimodal safety

X Hu, D Liu, H Li, X Huang, J Shao - arXiv preprint arXiv:2411.19939, 2024 - arxiv.org
Safety concerns of Multimodal large language models (MLLMs) have gradually become an
important problem in various applications. Surprisingly, previous works indicate a counter …

A survey on multimodal benchmarks: In the era of large ai models

L Li, G Chen, H Shi, J Xiao, L Chen - arXiv preprint arXiv:2409.18142, 2024 - arxiv.org
The rapid evolution of Multimodal Large Language Models (MLLMs) has brought substantial
advancements in artificial intelligence, significantly enhancing the capability to understand …

A Comprehensive Survey of Multimodal Large Language Models: Concept, Application and Safety

S Liu, W Pu, C Xu, Z Huang, Q Li, H Wang, C Lin… - 2024 - researchsquare.com
Recent advancements in MLLM, such as those exemplified by developments like GPT-4o,
have positioned them as a significant focus within the research community. MLLMs leverage …

HumanEval-V: Evaluating Visual Understanding and Reasoning Abilities of Large Multimodal Models Through Coding Tasks

F Zhang, L Wu, H Bai, G Lin, X Li, X Yu, Y Wang… - arXiv preprint arXiv …, 2024 - arxiv.org
Coding tasks have been valuable for evaluating Large Language Models (LLMs), as they
demand the comprehension of high-level instructions, complex reasoning, and the …

Audio Is the Achilles' Heel: Red Teaming Audio Large Multimodal Models

H Yang, L Qu, E Shareghi, G Haffari - arXiv preprint arXiv:2410.23861, 2024 - arxiv.org
Large Multimodal Models (LMMs) have demonstrated the ability to interact with humans
under real-world conditions by combining Large Language Models (LLMs) and modality …

Multi-PA: A Multi-perspective Benchmark on Privacy Assessment for Large Vision-Language Models

J Zhang, X Cao, Z Han, S Shan, X Chen - arXiv preprint arXiv:2412.19496, 2024 - arxiv.org
Large Vision-Language Models (LVLMs) exhibit impressive potential across various tasks
but also face significant privacy risks, limiting their practical applications. Current researches …