Structext: Structured text understanding with multi-modal transformers

Y Li, Y Qian, Y Yu, X Qin, C Zhang, Y Liu… - Proceedings of the 29th …, 2021 - dl.acm.org
Structured text understanding on Visually Rich Documents (VRDs) is a crucial part of
Document Intelligence. Due to the complexity of content and layout in VRDs, structured text …

Business insights using RAG–LLMs: a review and case study

M Arslan, S Munawar, C Cruz - Journal of Decision Systems, 2024 - Taylor & Francis
As organizations increasingly rely on diverse data sources like invoices and surveys,
efficient Information Extraction (IE) is crucial. Natural Language Processing (NLP) enhances …

Attention where it matters: Rethinking visual document understanding with selective region concentration

H Cao, C Bao, C Liu, H Chen, K Yin… - Proceedings of the …, 2023 - openaccess.thecvf.com
We propose a novel end-to-end document understanding model called SeRum (SElective
Region Understanding Model) for extracting meaningful information from document images …

ViBERTgrid: a jointly trained multi-modal 2D document representation for key information extraction from documents

W Lin, Q Gao, L Sun, Z Zhong, K Hu, Q Ren… - Document Analysis and …, 2021 - Springer
Recent grid-based document representations like BERTgrid allow the simultaneous
encoding of the textual and layout information of a document in a 2D feature map so that …

[PDF][PDF] Donut: Document understanding transformer without ocr

G Kim, T Hong, M Yim, J Park, J Yim… - arXiv preprint arXiv …, 2021 - sangdooyun.github.io
Understanding document images (eg, invoices) has been an important research topic and
has many applications in document processing automation. Through the latest advances in …

Query-driven generative network for document information extraction in the wild

H Cao, X Li, J Ma, D Jiang, A Guo, Y Hu, H Liu… - Proceedings of the 30th …, 2022 - dl.acm.org
This paper focuses on solving Document Information Extraction (DIE) in the wild problem,
which is rarely explored before. In contrast to existing studies mainly tailored for document …

Stable: Table generation framework for encoder-decoder models

M Pietruszka, M Turski, Ł Borchmann, T Dwojak… - arXiv preprint arXiv …, 2022 - arxiv.org
The output structure of database-like tables, consisting of values structured in horizontal
rows and vertical columns identifiable by name, can cover a wide range of NLP tasks …

Deep Learning based Key Information Extraction from Business Documents: Systematic Literature Review

A Rombach, P Fettke - arXiv preprint arXiv:2408.06345, 2024 - arxiv.org
Extracting key information from documents represents a large portion of business workloads
and therefore offers a high potential for efficiency improvements and process automation …

Improving information extraction on business documents with specific pre-training tasks

T Douzon, S Duffner, C Garcia, J Espinas - International Workshop on …, 2022 - Springer
Abstract Transformer-based Language Models are widely used in Natural Language
Processing related tasks. Thanks to their pre-training, they have been successfully adapted …

Fusion of visual representations for multimodal information extraction from unstructured transactional documents

B Oral, G Eryiğit - International Journal on Document Analysis and …, 2022 - Springer
The importance of automated document understanding in terms of today's businesses'
speed, efficiency, and cost reduction is indisputable. Although structured and semi …