Eigen: Expert-Informed Joint Learning Aggregation for High-Fidelity Information Extraction from Document Images

A Singh, V Subramanian… - … Learning for Health …, 2023 - proceedings.mlr.press
Abstract Information Extraction (IE) from document images is challenging due to the high
variability of layout formats. Deep models such as etc. In this work, we propose a novel …

NeCo@ ALQAC 2023: Legal Domain Knowledge Acquisition for Low-Resource Languages through Data Enrichment

HL Nguyen, DQ Nguyen, HT Nguyen… - … on Knowledge and …, 2023 - ieeexplore.ieee.org
In recent years, natural language processing has gained significant popularity in various
sectors, including the legal domain. This paper presents NeCo Team's solutions to the …

A medical information extraction workbench to process german clinical text

R Roller, L Seiffe, A Ayach, S Möller, O Marten… - arXiv preprint arXiv …, 2022 - arxiv.org
Background: In the information extraction and natural language processing domain,
accessible datasets are crucial to reproduce and compare results. Publicly available …

Kleister: A novel task for information extraction involving long documents with complex layout

F Graliński, T Stanisławek, A Wróblewska… - arXiv preprint arXiv …, 2020 - arxiv.org
State-of-the-art solutions for Natural Language Processing (NLP) are able to capture a
broad range of contexts, like the sentence-level context or document-level context for short …

End-to-end information extraction without token-level supervision

RB Palm, D Hovy, F Laws, O Winther - arXiv preprint arXiv:1707.04913, 2017 - arxiv.org
Most state-of-the-art information extraction approaches rely on token-level labels to find the
areas of interest in text. Unfortunately, these labels are time-consuming and costly to create …

An Augmentation Strategy for Visually Rich Documents

J Xie, JB Wendt, Y Zhou, S Ebner, S Tata - arXiv preprint arXiv:2212.10047, 2022 - arxiv.org
Many business workflows require extracting important fields from form-like documents (eg
bank statements, bills of lading, purchase orders, etc.). Recent techniques for automating …

Freedom: A transferable neural architecture for structured information extraction on web documents

BY Lin, Y Sheng, N Vo, S Tata - Proceedings of the 26th ACM SIGKDD …, 2020 - dl.acm.org
Extracting structured data from HTML documents is a long-studied problem with a broad
range of applications like augmenting knowledge bases, supporting faceted search, and …

Beyond word for word: Fact guided training for neural data-to-document generation

F Nie, H Chen, J Wang, R Pan, CY Lin - … 9–14, 2019, Proceedings, Part I 8, 2019 - Springer
Recent end-to-end encoder-decoder neural models for data-to-text generation can produce
fluent and seemingly informative texts despite these models disregard the traditional content …

Pretrained domain-specific language model for general information retrieval tasks in the aec domain

Z Zheng, XZ Lu, KY Chen, YC Zhou, JR Lin - arXiv preprint arXiv …, 2022 - arxiv.org
As an essential task for the architecture, engineering, and construction (AEC) industry,
information retrieval (IR) from unstructured textual data based on natural language …

End-to-end QA on COVID-19: domain adaptation with synthetic training

RG Reddy, B Iyer, MA Sultan, R Zhang, A Sil… - arXiv preprint arXiv …, 2020 - arxiv.org
End-to-end question answering (QA) requires both information retrieval (IR) over a large
document collection and machine reading comprehension (MRC) on the retrieved …