Metavl: Transferring in-context learning ability from language models to vision-language models

X Yue, Y Ni, K Zhang, T Zheng, R Liu… - Proceedings of the …, 2024 - openaccess.thecvf.com

We introduce MMMU: a new benchmark designed to evaluate multimodal models on
massive multi-discipline tasks demanding college-level subject knowledge and deliberate …

被引用次数：163 相关文章所有 3 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：108 相关文章所有 6 个版本

[PDF] arxiv.org

Gpt-4v (ision) is a generalist web agent, if grounded

B Zheng, B Gou, J Kil, H Sun, Y Su - arXiv preprint arXiv:2401.01614, 2024 - arxiv.org

The recent development on large multimodal models (LMMs), especially GPT-4V (ision) and
Gemini, has been quickly expanding the capability boundaries of multimodal models …

被引用次数：52 相关文章所有 4 个版本

[PDF] arxiv.org

Large multimodal models: Notes on cvpr 2023 tutorial

C Li - arXiv preprint arXiv:2306.14895, 2023 - arxiv.org

This tutorial note summarizes the presentation on``Large Multimodal Models: Towards
Building and Surpassing Multimodal GPT-4'', a part of CVPR 2023 tutorial on``Recent …

被引用次数：19 相关文章所有 2 个版本

[PDF] arxiv.org

Unsolvable Problem Detection: Evaluating Trustworthiness of Vision Language Models

A Miyai, J Yang, J Zhang, Y Ming, Q Yu, G Irie… - arXiv preprint arXiv …, 2024 - arxiv.org

This paper introduces a novel and significant challenge for Vision Language Models
(VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to …

被引用次数：2 相关文章所有 2 个版本

[PDF] openreview.net

Understanding and Improving In-Context Learning on Vision-language Models

S Chen, Z Han, B He, M Buckley, P Torr… - arXiv preprint arXiv …, 2023 - openreview.net

In-context learning (ICL) on large language models (LLMs) has received great attention, and
this technique can also be applied to vision-language models (VLMs) built upon LLMs …

被引用次数：4 相关文章所有 3 个版本

[PDF] arxiv.org

On the Potential and Limitations of Few-Shot In-Context Learning to Generate Metamorphic Specifications for Tax Preparation Software

D Srinivas, R Das, S Tizpaz-Niari, A Trivedi… - arXiv preprint arXiv …, 2023 - arxiv.org

Due to the ever-increasing complexity of income tax laws in the United States, the number of
US taxpayers filing their taxes using tax preparation software (henceforth, tax software) …

被引用次数：1 相关文章所有 3 个版本

Empowering Vision-Language Models for Reasoning Ability through Large Language Models

Y Yang, X Zhang, J Xu, W Han - ICASSP 2024-2024 IEEE …, 2024 - ieeexplore.ieee.org

Vision-language models (VLM) have shown excellent performance in vision-language tasks.
However, they sometimes lack sufficient reasoning ability. In contrast, large language …

被引用次数：1 相关文章

[PDF] arxiv.org

Large Visual-Language Models Are Also Good Classifiers: A Study of In-Context Multimodal Fake News Detection

Y Jiang, Y Wang - arXiv preprint arXiv:2407.12879, 2024 - arxiv.org

Large visual-language models (LVLMs) exhibit exceptional performance in visual-language
reasoning across diverse cross-modal benchmarks. Despite these advances, recent …

Unsolvable Problem Detection for Vision Language Models

A Miyai, J Yang, J Zhang, Y Ming, Q Yu, G Irie… - ICLR 2024 Workshop on … - openreview.net

This paper introduces a novel and significant challenge for Vision Language Models
(VLMs), termed Unsolvable Problem Detection (UPD). UPD examines the VLM's ability to …

高级搜索

QQ 群