Multi-modal document retrieval is designed to identify and retrieve various forms of multi- modal content, such as figures, tables, charts, and layout information from extensive …
Document processing and related tasks such as information extraction represent a large portion of business workloads and therefore offer high potential for efficiency improvements …
C Zhang, Y Zhao, C Yuan, Y Tu, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org
Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable success in multiple information extraction tasks on visually-rich documents. However, the …
In this paper, we address the challenge of effectively utilizing Large Language Models (LLMs) for Visually Rich Document Understanding (VRDU), a key part of intelligent …