Improving natural language understanding with computation-efficient retrieval representation fusion

S Wu, Y Xiong, Y Cui, X Liu, B Tang, TW Kuo… - arXiv preprint arXiv …, 2024 - arxiv.org
Retrieval-based augmentations that aim to incorporate knowledge from an external
database into language models have achieved great success in various knowledge …

Lmdx: Language model-based document information extraction and localization

V Perot, K Kang, F Luisier, G Su, X Sun… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Language Models (LLM) have revolutionized Natural Language Processing (NLP),
improving state-of-the-art on many existing tasks and exhibiting emergent capabilities …

Doc2dict: Information extraction as text generation

B Townsend, E Ito-Fisher, L Zhang, M May - arXiv preprint arXiv …, 2021 - arxiv.org
Typically, information extraction (IE) requires a pipeline approach: first, a sequence labeling
model is trained on manually annotated documents to extract relevant spans; then, when a …

Instructuie: Multi-task instruction tuning for unified information extraction

X Wang, W Zhou, C Zu, H Xia, T Chen, Y Zhang… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models have unlocked strong multi-task capabilities from reading instructive
prompts. However, recent studies have shown that existing large models still have difficulty …

Consideration of the word's neighborhood in GATs for information extraction in semi-structured documents

D Belhadj, Y Belaïd, A Belaïd - … , September 5–10, 2021, Proceedings, Part …, 2021 - Springer
Most administrative documents take a semi-structured form (invoices, payslips, etc.).
Extracting information from this type of document is still challenging because of the …

Attention-based graph neural network with global context awareness for document understanding

Y Hua, Z Huang, J Guo, W Qiu - … , CCL 2020, Hainan, China, October 30 …, 2020 - Springer
Abstract Information extraction from documents such as receipts or invoices is a fundamental
and crucial step for office automation. Many approaches focus on extracting entities and …

Legal document retrieval using document vector embeddings and deep learning

K Sugathadasa, B Ayesha, N de Silva… - … : Proceedings of the …, 2019 - Springer
Abstract Domain specific information retrieval process has been a prominent and ongoing
research in the field of natural language processing. Many researchers have incorporated …

Leveraging knowledge bases in lstms for improving machine reading

B Yang, T Mitchell - arXiv preprint arXiv:1902.09091, 2019 - arxiv.org
This paper focuses on how to take advantage of external knowledge bases (KBs) to improve
recurrent neural networks for machine reading. Traditional methods that exploit knowledge …

FieldSwap: Data Augmentation for Effective Form-Like Document Extraction

J Xie, JB Wendt, Y Zhou, S Ebner… - 2024 IEEE 40th …, 2024 - ieeexplore.ieee.org
Extracting structured data from visually rich documents like invoices, receipts, financial
statements, and tax forms is key to automating many business workflows. However, building …

A small samples training framework for deep Learning-based automatic information extraction: Case study of construction accident news reports analysis

D Feng, H Chen - Advanced Engineering Informatics, 2021 - Elsevier
Abstract Knowledge management is crucial for construction safety management. Widely
collected and well-organized safety-related documents are recognized to be significant in …