Docile benchmark for document information localization and extraction

Š Šimsa, M Šulc, M Uřičář, Y Patel, A Hamdi… - … on Document Analysis …, 2023 - Springer
This paper introduces the DocILE benchmark with the largest dataset of business
documents for the tasks of Key Information Localization and Extraction and Line Item …

Beyond Document Page Classification: Design, Datasets, and Challenges

J Van Landeghem, S Biswas… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper highlights the need to bring document classification benchmarking closer to real-
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …

Business document information extraction: Towards practical benchmarks

M Skalický, Š Šimsa, M Uřičář, M Šulc - International Conference of the …, 2022 - Springer
Abstract Information extraction from semi-structured documents is crucial for frictionless
business-to-business (B2B) communication. While machine learning problems related to …

Docile 2023 teaser: document information localization and extraction

Š Šimsa, M Šulc, M Skalický, Y Patel… - European Conference on …, 2023 - Springer
The lack of data for information extraction (IE) from semi-structured business documents is a
real problem for the IE community. Publications relying on large-scale datasets use only …

[PDF][PDF] USTC-iFLYTEK at DocILE: A Multi-modal Approach Using Domain-specific GraphDoc.

Y Wang, J Du, J Ma, P Hu, Z Zhang, J Zhang - CLEF (Working Notes), 2023 - ceur-ws.org
With the development of digitalization in business, the automatic extraction of information
from semistructured business documents is becoming increasingly important. This paper …

Object detection in invoices

AŞ Bulzan, C Cernăzanu-Glăvan - 2022 26th International …, 2022 - ieeexplore.ieee.org
Key field information extraction from documents is an increasingly covetable task. Previous
related work has touched upon the subject through the lens of rule-based systems or …

Exploring the Potential of OCR Integration for Object Detection in Invoices

AŞ Bulzan, C Cernăzanu-Glăvan… - 2023 27th International …, 2023 - ieeexplore.ieee.org
This paper investigates the impact of incorporating Optical Character Recognition (OCR)
information into object detection models for extracting key information fields from invoices …

Failure Prediction in 2D Document Information Extraction with Calibrated Confidence Scores

J Kivimäki, A Lebedev… - 2023 IEEE 47th Annual …, 2023 - ieeexplore.ieee.org
Modern machine learning models can achieve impressive results in many tasks, but often
fail to express reliably how confident they are with their predictions. In an industrial setting …

Geographic information extraction from texts

X Hu, Y Hu, B Resch, J Kersten - 2023 - Springer
The 45th European Conference on Information Retrieval (ECIR 2023) was held in Dublin,
Ireland, during April 2–6, 2023, and brought together hundreds of researchers from Europe …

Effectiveness of Implementing the Automatic Exchange of Information Policy in Efforts to Increase Tax Revenue

NF Wahyudi, MRUD Tambunan - Jurnal Public Policy, 2023 - jurnal.utu.ac.id
Abstract The Automatic Information Exchange policy implemented in Indonesia since 2018
has found obstacles in the utilization and processing of AEoI data. This research aims to …