Enhancing vision-language pre-training with rich supervisions

Y Gao, K Shi, P Zhu, E Belval… - Proceedings of the …, 2024 - openaccess.thecvf.com
Abstract We propose Strongly Supervised pre-training with ScreenShots (S4)-a novel pre-
training paradigm for Vision-Language Models using data from large-scale web screenshot …

VisFocus: Prompt-Guided Vision Encoders for OCR-Free Dense Document Understanding

O Abramovich, N Nayman, S Fogel, I Lavi… - arXiv preprint arXiv …, 2024 - arxiv.org
In recent years, notable advancements have been made in the domain of visual document
understanding, with the prevailing architecture comprising a cascade of vision and language …

Improving Multi-Agent Debate with Sparse Communication Topology

Y Li, Y Du, J Zhang, L Hou, P Grabowski, Y Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Multi-agent debate has proven effective in improving large language models quality for
reasoning and factuality tasks. While various role-playing strategies in multi-agent debates …