Data-efficient information extraction from documents with pre-trained language models

C Sage, T Douzon, A Aussem, V Eglin… - Document Analysis and …, 2021 - Springer
Like for many text understanding and generation tasks, pre-trained languages models have
emerged as a powerful approach for extracting information from business documents …

DocReader: bounding-box free training of a document information extraction model

S Klaiman, M Lehne - Document Analysis and Recognition–ICDAR 2021 …, 2021 - Springer
Abstract Information extraction from documents is a ubiquitous first step in many business
applications. During this step, the entries of various fields must first be read from the images …

Improving information extraction on business documents with specific pre-training tasks

T Douzon, S Duffner, C Garcia, J Espinas - International Workshop on …, 2022 - Springer
Abstract Transformer-based Language Models are widely used in Natural Language
Processing related tasks. Thanks to their pre-training, they have been successfully adapted …

Sources of success for information extraction methods

D Kauchak, J Smarr, C Elkan - 2002 - escholarship.org
In this paper, we examine an important recent rule-based information extraction (IE)
technique named Boosted Wrapper Induction (BWI), by conducting experiments on a wider …

Lmdx: Language model-based document information extraction and localization

V Perot, K Kang, F Luisier, G Su, X Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP),
improving state-of-the-art on many existing tasks and exhibiting emergent capabilities …

Information extraction of domain-specific business documents with limited data

MT Nguyen, DT Le, NH Son, BC Minh… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Information extraction is a key corner-stone in the digitization of office data which requires
the conversion of unstructured to structured data. However, in the actual application to …

A span extraction approach for information extraction on visually-rich documents

TAD Nguyen, HM Vu, NH Son, MT Nguyen - Document Analysis and …, 2021 - Springer
Abstract Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA
performance recently thanks to the adaptation of Transformer-based language models …

Data-Efficient Information Extraction from Form-Like Documents

B Gunel, N Potti, S Tata, JB Wendt, M Najork… - arXiv preprint arXiv …, 2022 - arxiv.org
Automating information extraction from form-like documents at scale is a pressing need due
to its potential impact on automating business workflows across many industries like …

PyTorch-IE: Fast and Reproducible Prototyping for Information Extraction

A Binder, L Hennig, C Alt - arXiv preprint arXiv:2406.00007, 2024 - arxiv.org
The objective of Information Extraction (IE) is to derive structured representations from
unstructured or semi-structured documents. However, developing IE models is complex due …

Business document information extraction: Towards practical benchmarks

M Skalický, Š Šimsa, M Uřičář, M Šulc - International Conference of the …, 2022 - Springer
Abstract Information extraction from semi-structured documents is crucial for frictionless
business-to-business (B2B) communication. While machine learning problems related to …