相关文章- 学术资源搜索

Exploring the frontier of vision-language models: A survey of current methodologies and future directions

A Ghosh, A Acharya, S Saha, V Jain… - arXiv preprint arXiv …, 2024 - arxiv.org

The advent of Large Language Models (LLMs) has significantly reshaped the trajectory of
the AI revolution. Nevertheless, these LLMs exhibit a notable limitation, as they are primarily …

被引用次数：1 相关文章所有 5 个版本

[PDF] arxiv.org

What matters when building vision-language models?

H Laurençon, L Tronchon, M Cord, V Sanh - arXiv preprint arXiv …, 2024 - arxiv.org

The growing interest in vision-language models (VLMs) has been driven by improvements in
large language models and vision transformers. Despite the abundance of literature on this …

被引用次数：27 相关文章所有 2 个版本

[PDF] arxiv.org

An Empirical Study Into What Matters for Calibrating Vision-Language Models

W Tu, W Deng, D Campbell, S Gould… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision--Language Models (VLMs) have emerged as the dominant approach for zero-shot
recognition, adept at handling diverse scenarios and significant distribution changes …

Heron-bench: A benchmark for evaluating vision language models in japanese

Y Inoue, K Sasaki, Y Ochi, K Fujii, K Tanahashi… - arXiv preprint arXiv …, 2024 - arxiv.org

Vision Language Models (VLMs) have undergone a rapid evolution, giving rise to significant
advancements in the realm of multimodal understanding tasks. However, the majority of …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Veagle: Advancements in Multimodal Representation Learning

R Chawla, A Datta, T Verma, A Jha, A Gautam… - arXiv preprint arXiv …, 2024 - arxiv.org

Lately, researchers in artificial intelligence have been really interested in how language and
vision come together, giving rise to the development of multimodal models that aim to …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Are We on the Right Way for Evaluating Large Vision-Language Models?

L Chen, J Li, X Dong, P Zhang, Y Zang, Z Chen… - arXiv preprint arXiv …, 2024 - arxiv.org

Large vision-language models (LVLMs) have recently achieved rapid progress, sparking
numerous studies to evaluate their multi-modal capabilities. However, we dig into current …

被引用次数：30 相关文章所有 3 个版本

[PDF] arxiv.org

Touchstone: Evaluating vision-language models by language models

S Bai, S Yang, J Bai, P Wang, X Zhang, J Lin… - arXiv preprint arXiv …, 2023 - arxiv.org

Large vision-language models (LVLMs) have recently witnessed rapid advancements,
exhibiting a remarkable capacity for perceiving, understanding, and processing visual …

被引用次数：24 相关文章所有 2 个版本

[PDF] arxiv.org

Unveiling Encoder-Free Vision-Language Models

H Diao, Y Cui, X Li, Y Wang, H Lu, X Wang - arXiv preprint arXiv …, 2024 - arxiv.org

Existing vision-language models (VLMs) mostly rely on vision encoders to extract visual
features followed by large language models (LLMs) for visual-language tasks. However, the …

[PDF] arxiv.org

Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks

Z Li, Y Wang, M Du, Q Liu, B Wu, J Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org

Recent years have witnessed remarkable progress in the development of large vision-
language models (LVLMs). Benefiting from the strong language backbones and efficient …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

J Zhang, S Wang, X Cao, Z Yuan, S Shan… - arXiv preprint arXiv …, 2024 - arxiv.org

The emergence of Large Vision-Language Models (LVLMs) marks significant strides
towards achieving general artificial intelligence. However, these advancements are …

高级搜索

QQ 群

Exploring the frontier of vision-language models: A survey of current methodologies and future directions

What matters when building vision-language models?

An Empirical Study Into What Matters for Calibrating Vision-Language Models

Heron-bench: A benchmark for evaluating vision language models in japanese

Veagle: Advancements in Multimodal Representation Learning

Are We on the Right Way for Evaluating Large Vision-Language Models?

Touchstone: Evaluating vision-language models by language models

Unveiling Encoder-Free Vision-Language Models

Reform-eval: Evaluating large vision language models via unified re-formulation of task-oriented benchmarks

VLBiasBench: A Comprehensive Benchmark for Evaluating Bias in Large Vision-Language Model

相关搜索

引用