Information extraction of domain-specific business documents with limited data

MT Nguyen, DT Le, NH Son, BC Minh… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Information extraction is a key corner-stone in the digitization of office data which requires
the conversion of unstructured to structured data. However, in the actual application to …

Gain more with less: Extracting information from business documents with small data

MT Nguyen, NH Son - Expert Systems with Applications, 2023 - Elsevier
Abstract Information extraction (IE) is a vital step of digitization that reduces paperwork in
offices. However, the adaptation of common IE systems to actual business cases faces two …

Data-efficient information extraction from documents with pre-trained language models

C Sage, T Douzon, A Aussem, V Eglin… - Document Analysis and …, 2021 - Springer
Like for many text understanding and generation tasks, pre-trained languages models have
emerged as a powerful approach for extracting information from business documents …

Transformers-based information extraction with limited data for domain-specific business documents

MT Nguyen, DT Le, L Le - Engineering Applications of Artificial Intelligence, 2021 - Elsevier
Abstract Information extraction plays an important role for data transformation in business
cases. However, building extraction systems in actual cases face two challenges:(i) the …

Key information extraction from documents: Evaluation and generator

O Bensch, M Popa, C Spille - arXiv preprint arXiv:2106.14624, 2021 - arxiv.org
Extracting information from documents usually relies on natural language processing
methods working on one-dimensional sequences of text. In some cases, for example, for the …

Improving information extraction on business documents with specific pre-training tasks

T Douzon, S Duffner, C Garcia, J Espinas - International Workshop on …, 2022 - Springer
Abstract Transformer-based Language Models are widely used in Natural Language
Processing related tasks. Thanks to their pre-training, they have been successfully adapted …

Aurora: An information extraction system of domain-specific business documents with limited data

MT Nguyen, DT Le, LT Linh, N Hong Son… - Proceedings of the 29th …, 2020 - dl.acm.org
Information extraction is a well-known topic that plays a critical role in many NLP
applications as its outputs can be considered as an entrance step for digital transformation …

DocReader: bounding-box free training of a document information extraction model

S Klaiman, M Lehne - Document Analysis and Recognition–ICDAR 2021 …, 2021 - Springer
Abstract Information extraction from documents is a ubiquitous first step in many business
applications. During this step, the entries of various fields must first be read from the images …

A span extraction approach for information extraction on visually-rich documents

TAD Nguyen, HM Vu, NH Son, MT Nguyen - Document Analysis and …, 2021 - Springer
Abstract Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA
performance recently thanks to the adaptation of Transformer-based language models …

Jointly learning span extraction and sequence labeling for information extraction from business documents

NH Son, MY Hieu, TAD Nguyen… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
This paper introduces a new information extraction model for business documents. Different
from prior studies which only base on span extraction or sequence labeling, the model takes …