DocReader: bounding-box free training of a document information extraction model

S Klaiman, M Lehne - Document Analysis and Recognition–ICDAR 2021 …, 2021 - Springer
Abstract Information extraction from documents is a ubiquitous first step in many business
applications. During this step, the entries of various fields must first be read from the images …

Icl-d3ie: In-context learning with diverse demonstrations updating for document information extraction

J He, L Wang, Y Hu, N Liu, H Liu… - Proceedings of the …, 2023 - openaccess.thecvf.com
Large language models (LLMs), such as GPT-3 and ChatGPT, have demonstrated
remarkable results in various natural language processing (NLP) tasks with in-context …

Key information extraction from documents: Evaluation and generator

O Bensch, M Popa, C Spille - arXiv preprint arXiv:2106.14624, 2021 - arxiv.org
Extracting information from documents usually relies on natural language processing
methods working on one-dimensional sequences of text. In some cases, for example, for the …

Attend, copy, parse end-to-end information extraction from documents

RB Palm, F Laws, O Winther - 2019 International Conference …, 2019 - ieeexplore.ieee.org
Document information extraction tasks performed by humans create data consisting of a
PDF or document image input, and extracted string outputs. This end-to-end data is naturally …

Language models for document understanding

T Douzon - 2023 - theses.hal.science
Every day, an uncountable amount of documents are received and processed by companies
worldwide. In an effort to reduce the cost of processing each document, the largest …

SciREX: A challenge dataset for document-level information extraction

S Jain, M Van Zuylen, H Hajishirzi, I Beltagy - arXiv preprint arXiv …, 2020 - arxiv.org
Extracting information from full documents is an important problem in many domains, but
most previous work focus on identifying relationships within a sentence or a paragraph. It is …

Query-driven generative network for document information extraction in the wild

H Cao, X Li, J Ma, D Jiang, A Guo, Y Hu, H Liu… - Proceedings of the 30th …, 2022 - dl.acm.org
This paper focuses on solving Document Information Extraction (DIE) in the wild problem,
which is rarely explored before. In contrast to existing studies mainly tailored for document …

Eigen: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images

A Singh, V Subramanian… - … Learning for Health …, 2023 - proceedings.mlr.press
Abstract Information Extraction (IE) from document images is challenging due to the high
variability of layout formats. Deep models such as etc. In this work, we propose a novel …

Cutie: Learning to understand documents with convolutional universal text information extractor

X Zhao, E Niu, Z Wu, X Wang - arXiv preprint arXiv:1903.12363, 2019 - arxiv.org
Extracting key information from documents, such as receipts or invoices, and preserving the
interested texts to structured data is crucial in the document-intensive streamline processes …

Data-efficient information extraction from documents with pre-trained language models

C Sage, T Douzon, A Aussem, V Eglin… - Document Analysis and …, 2021 - Springer
Like for many text understanding and generation tasks, pre-trained languages models have
emerged as a powerful approach for extracting information from business documents …