Exploring the frontier of vision-language models: A survey of current methodologies and future directions

A Ghosh, A Acharya, S Saha, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org
The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of
the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily …

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord, V Sanh - arXiv preprint arXiv …, 2024 - arxiv.org
The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

An Empirical Study Into What Matters for Calibrating Vision-Language Models

W Tu, W Deng, D Campbell, S Gould… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision--Language Models (VLMs) have emerged as the dominant approach for zero-shot
recognition, adept at handling diverse scenarios and significant distribution changes …

Heron-bench: A benchmark for evaluating vision language models in japanese

Y Inoue, K Sasaki, Y Ochi, K Fujii, K Tanahashi… - arXiv preprint arXiv …, 2024 - arxiv.org
Vision Language Models (VLMs) have undergone a rapid evolution, giving rise to significant
advancements in the realm of multimodal understanding tasks. However, the majority of …

Veagle: Advancements in Multimodal Representation Learning

R Chawla, A Datta, T Verma, A Jha, A Gautam… - arXiv preprint arXiv …, 2024 - arxiv.org
Lately, researchers in artificial intelligence have been really interested in how language and
vision come together, giving rise to the development of multimodal models that aim to …

Are We on the Right Way for Evaluating Large Vision-Language Models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arXiv preprint arXiv …, 2024 - arxiv.org
Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

Touchstone: Evaluating vision-language models by language models

S Bai, S Yang, J Bai, P Wang, X Zhang, J Lin… - arXiv preprint arXiv …, 2023 - arxiv.org
Large vision-language models (LVLMs) have recently witnessed rapid advancements,
exhibiting a remarkable capacity for perceiving, understanding, and processing visual …

Unveiling Encoder-Free Vision-Language Models

H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang - arXiv preprint arXiv …, 2024 - arxiv.org
Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual
features followed by large language models (LLMs) for visual-language tasks. However, the …

Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks

Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Recent years have witnessed remarkable progress in the development of large vision-
language models (LVLMs). Benefiting from the strong language backbones and efficient …

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

J Zhang, S Wang, X Cao, Z Yuan, S Shan… - arXiv preprint arXiv …, 2024 - arxiv.org
The emergence of Large Vision-Language Models (LVLMs) marks significant strides
towards achieving general artificial intelligence. However, these advancements are …