Jointly learning span extraction and sequence labeling for information extraction from business documents

NH Son, MY Hieu, TAD Nguyen… - 2022 International Joint …, 2022 - ieeexplore.ieee.org
This paper introduces a new information extraction model for business documents. Different
from prior studies which only base on span extraction or sequence labeling, the model takes …

Gain more with less: Extracting information from business documents with small data

MT Nguyen, NH Son - Expert Systems with Applications, 2023 - Elsevier
Abstract Information extraction (IE) is a vital step of digitization that reduces paperwork in
offices. However, the adaptation of common IE systems to actual business cases faces two …

A span extraction approach for information extraction on visually-rich documents

TAD Nguyen, HM Vu, NH Son, MT Nguyen - Document Analysis and …, 2021 - Springer
Abstract Information extraction (IE) for visually-rich documents (VRDs) has achieved SOTA
performance recently thanks to the adaptation of Transformer-based language models …

An empirical study on finding spans

W Gu, B Zheng, Y Chen, T Chen… - arXiv preprint arXiv …, 2022 - arxiv.org
We present an empirical study on methods for span finding, the selection of consecutive
tokens in text for some downstream tasks. We focus on approaches that can be employed in …

Information extraction of domain-specific business documents with limited data

MT Nguyen, DT Le, NH Son, BC Minh… - … Joint Conference on …, 2021 - ieeexplore.ieee.org
Information extraction is a key corner-stone in the digitization of office data which requires
the conversion of unstructured to structured data. However, in the actual application to …

Span-Oriented Information Extraction--A Unifying Perspective on Information Extraction

Y Ding, M Yankoski, T Weninger - arXiv preprint arXiv:2403.15453, 2024 - arxiv.org
Information Extraction refers to a collection of tasks within Natural Language Processing
(NLP) that identifies sub-sequences within text and their labels. These tasks have been used …

Enhanced language representation with label knowledge for span extraction

P Yang, X Cong, Z Sun, X Liu - arXiv preprint arXiv:2111.00884, 2021 - arxiv.org
Span extraction, aiming to extract text spans (such as words or phrases) from plain texts, is a
fundamental process in Information Extraction. Recent works introduce the label knowledge …

Transformers-based information extraction with limited data for domain-specific business documents

MT Nguyen, DT Le, L Le - Engineering Applications of Artificial Intelligence, 2021 - Elsevier
Abstract Information extraction plays an important role for data transformation in business
cases. However, building extraction systems in actual cases face two challenges:(i) the …

Aurora: An information extraction system of domain-specific business documents with limited data

MT Nguyen, DT Le, LT Linh, N Hong Son… - Proceedings of the 29th …, 2020 - dl.acm.org
Information extraction is a well-known topic that plays a critical role in many NLP
applications as its outputs can be considered as an entrance step for digital transformation …

Data-efficient information extraction from documents with pre-trained language models

C Sage, T Douzon, A Aussem, V Eglin… - Document Analysis and …, 2021 - Springer
Like for many text understanding and generation tasks, pre-trained languages models have
emerged as a powerful approach for extracting information from business documents …