Question aware vision transformer for multimodal reasoning

文章

学术资源搜索

获得 3 条结果（用时0.02秒）

我的图书馆

Question aware vision transformer for multimodal reasoning

在引用文章中搜索

[PDF] thecvf.com

Enhancing vision-language pre-training with rich supervisions

Y Gao, K Shi, P Zhu, E Belval… - Proceedings of the …, 2024 - openaccess.thecvf.com

Abstract We propose Strongly Supervised pre-training with ScreenShots (S4)-a novel pre-
training paradigm for Vision-Language Models using data from large-scale web screenshot …

被引用次数：2 相关文章所有 6 个版本

[PDF] arxiv.org

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

O Abramovich, N Nayman, S Fogel, I Lavi… - arXiv preprint arXiv …, 2024 - arxiv.org

In recent years, notable advancements have been made in the domain of visual document
understanding, with the prevailing architecture comprising a cascade of vision and language …

Improving Multi-Agent Debate with Sparse Communication Topology

Y Li, Y Du, J Zhang, L Hou, P Grabowski, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Multi-agent debate has proven effective in improving large language models quality for
reasoning and factuality tasks. While various role-playing strategies in multi-agent debates …

高级搜索

QQ 群

Question aware vision transformer for multimodal reasoning

Enhancing vision-language pre-training with rich supervisions

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

Improving Multi-Agent Debate with Sparse Communication Topology

引用