Visually-Rich Document Understanding: Concepts, Taxonomy and Challenges

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review

A Rombach, P Fettke - arXiv preprint arXiv:2408.06345, 2024 - arxiv.org

Extracting key information from documents represents a large portion of business workloads
and therefore offers a high potential for efficiency improvements and process automation …

MMDocIR: Benchmarking Multi-Modal Retrieval for Long Documents

K Dong, Y Chang, XD Goh, D Li, R Tang… - arXiv preprint arXiv …, 2025 - arxiv.org

Multi-modal document retrieval is designed to identify and retrieve various forms of multi-
modal content, such as figures, tables, charts, and layout information from extensive …

Utilizing Deep Learning for Field-Level Information Extraction from German Real Estate Tax Notices

AM Rombach, J Lahann, T Niesen… - Journal of Emerging …, 2024 - publications.aaahq.org

Document processing and related tasks such as information extraction represent a large
portion of business workloads and therefore offer high potential for efficiency improvements …

[PDF] arxiv.org

Rethinking the Evaluation of Pre-trained Text-and-Layout Models from an Entity-Centric Perspective

C Zhang, Y Zhao, C Yuan, Y Tu, Y Guo… - arXiv preprint arXiv …, 2024 - arxiv.org

Recently developed pre-trained text-and-layout models (PTLMs) have shown remarkable
success in multiple information extraction tasks on visually-rich documents. However, the …

被引用次数：1 相关文章所有 2 个版本

Embedding Layout in Text for Document Understanding Using Large Language Models

M Minouei, MR Soheili, D Stricker - International Conference on …, 2024 - Springer

In this paper, we address the challenge of effectively utilizing Large Language Models
(LLMs) for Visually Rich Document Understanding (VRDU), a key part of intelligent …

高级搜索

QQ 群