Abstract We propose Universal Document Processing (UDOP), a foundation Document AI model which unifies text, image, and layout modalities together with varied task formats …
The ever-growing abundance of data found in heterogeneous sources, such as scientific publications, has forced the development of automated techniques for data extraction. While …
This work presents a sparse-attention Transformer architecture for modeling documents that contain large tables. Tables are ubiquitous on the web, and are rich in information. However …
Tables are often created with hierarchies, but existing works on table reasoning mainly focus on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …
D Wang, N Raman, M Sibue, Z Ma, P Babkin… - arXiv preprint arXiv …, 2023 - arxiv.org
Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar records, often carry rich semantics at the intersection of textual and spatial modalities. The …
H Dong, Z Wang - Proceedings of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Tables contain a significant portion of the world's structured information. The ability to efficiently and accurately understand, process, reason about, analyze, and generate tabular …
Understanding documents with rich layouts and multi-modal components is a long-standing and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable …
Peer review is a key component of the publishing process in most fields of science. Increasing submission rates put a strain on reviewing quality and efficiency, motivating the …
A Byerly, T Kalganova, R Ott - Science and information conference, 2022 - Springer
We present a review of the methods behind the top 40 highest accuracies achieved on the ILSVRC 2012 Imagenet validation set as ranked on Papers with Code. A significant …