An Efficient Method Based on Region-adjacent Embedding for Text Classification of Chinese Electronic Medical Records

F Guo, T Wu, X Jin - 2020 5th International Conference on …, 2020 - ieeexplore.ieee.org
In the field of natural language processing (NLP), word-embedding-based models have
been widely applied in many tasks with great success, which are believed to make …

From dataset recycling to multi-property extraction and beyond

T Dwojak, M Pietruszka, Ł Borchmann… - arXiv preprint arXiv …, 2020 - arxiv.org
This paper investigates various Transformer architectures on the WikiReading Information
Extraction and Machine Reading Comprehension dataset. The proposed dual-source model …

A discriminative convolutional neural network with context-aware attention

Y Zhou, L Liao, Y Gao, H Huang, X Wei - ACM Transactions on …, 2020 - dl.acm.org
Feature representation and feature extraction are two crucial procedures in text mining.
Convolutional Neural Networks (CNN) have shown overwhelming success for text-mining …

End-to-end QA on Covid-19: domain adaptation with synthetic training

R Gangi Reddy, B Iyer, M Arafat Sultan… - arXiv e …, 2020 - ui.adsabs.harvard.edu
End-to-end question answering (QA) requires both information retrieval (IR) over a large
document collection and machine reading comprehension (MRC) on the retrieved …

Going full-tilt boogie on document understanding with text-image-layout transformer

R Powalski, Ł Borchmann, D Jurkiewicz… - Document Analysis and …, 2021 - Springer
We address the challenging problem of Natural Language Comprehension beyond plain-
text documents by introducing the TILT neural network architecture which simultaneously …

Scalable document image information extraction with application to domain-specific analysis

Y Zheng, S Kong, W Zhu, H Ye - 2019 IEEE International …, 2019 - ieeexplore.ieee.org
Document images are ubiquitous, but existing methods mainly focus on the text reading but
not information understanding. In this paper, we propose a novel document image …

Regulatory compliance through Doc2Doc information retrieval: A case study in EU/UK legislation where text similarity has limitations

I Chalkidis, M Fergadiotis, N Manginas… - arXiv preprint arXiv …, 2021 - arxiv.org
Major scandals in corporate history have urged the need for regulatory compliance, where
organizations need to ensure that their controls (processes) comply with relevant laws …

The resume corpus: a large dataset for research in information extraction systems

Y Su, J Zhang, J Lu - 2019 15th International Conference on …, 2019 - ieeexplore.ieee.org
We publish a Chinese Resume Corpus for researches of information extraction. The corpus
contains 178 thousand resume documents and over 33 million words. The resume …

A Data-centric Framework for Improving Domain-specific Machine Reading Comprehension Datasets

I Bojic, J Halim, V Suharman, S Tar, QC Ong… - arXiv preprint arXiv …, 2023 - arxiv.org
Low-quality data can cause downstream problems in high-stakes applications. Data-centric
approach emphasizes on improving dataset quality to enhance model performance. High …

A multi-resolution word embedding for document retrieval from large unstructured knowledge bases

T Cakaloglu, X Xu - arXiv preprint arXiv:1902.00663, 2019 - arxiv.org
Deep language models learning a hierarchical representation proved to be a powerful tool
for natural language processing, text mining and information retrieval. However …