Table pre-training: A survey on model architectures, pre-training objectives, and downstream tasks

H Dong, Z Cheng, X He, M Zhou, A Zhou… - arXiv preprint arXiv …, 2022 - arxiv.org
Since a vast number of tables can be easily collected from web pages, spreadsheets, PDFs,
and various other document types, a flurry of table pre-training frameworks have been …

Unifying vision, text, and layout for universal document processing

Z Tang, Z Yang, G Wang, Y Fang… - Proceedings of the …, 2023 - openaccess.thecvf.com
Abstract We propose Universal Document Processing (UDOP), a foundation Document AI
model which unifies text, image, and layout modalities together with varied task formats …

ChemDataExtractor 2.0: Autopopulated ontologies for materials science

J Mavracic, CJ Court, T Isazawa… - Journal of Chemical …, 2021 - ACS Publications
The ever-growing abundance of data found in heterogeneous sources, such as scientific
publications, has forced the development of automated techniques for data extraction. While …

MATE: multi-view attention for table transformer efficiency

JM Eisenschlos, M Gor, T Müller, WW Cohen - arXiv preprint arXiv …, 2021 - arxiv.org
This work presents a sparse-attention Transformer architecture for modeling documents that
contain large tables. Tables are ubiquitous on the web, and are rich in information. However …

Hitab: A hierarchical table dataset for question answering and natural language generation

Z Cheng, H Dong, Z Wang, R Jia, J Guo, Y Gao… - arXiv preprint arXiv …, 2021 - arxiv.org
Tables are often created with hierarchies, but existing works on table reasoning mainly focus
on flat tables and neglect hierarchical tables. Hierarchical tables challenge existing methods …

DocLLM: A layout-aware generative language model for multimodal document understanding

D Wang, N Raman, M Sibue, Z Ma, P Babkin… - arXiv preprint arXiv …, 2023 - arxiv.org
Enterprise documents such as forms, invoices, receipts, reports, contracts, and other similar
records, often carry rich semantics at the intersection of textual and spatial modalities. The …

Large language models for tabular data: Progresses and future directions

H Dong, Z Wang - Proceedings of the 47th International ACM SIGIR …, 2024 - dl.acm.org
Tables contain a significant portion of the world's structured information. The ability to
efficiently and accurately understand, process, reason about, analyze, and generate tabular …

Mmlongbench-doc: Benchmarking long-context document understanding with visualizations

Y Ma, Y Zang, L Chen, M Chen, Y Jiao, X Li… - arXiv preprint arXiv …, 2024 - arxiv.org
Understanding documents with rich layouts and multi-modal components is a long-standing
and practical task. Recent Large Vision-Language Models (LVLMs) have made remarkable …

Revise and resubmit: An intertextual model of text-based collaboration in peer review

I Kuznetsov, J Buchmann, M Eichler… - Computational …, 2022 - direct.mit.edu
Peer review is a key component of the publishing process in most fields of science.
Increasing submission rates put a strain on reviewing quality and efficiency, motivating the …

The current state of the art in deep learning for image classification: a review

A Byerly, T Kalganova, R Ott - Science and information conference, 2022 - Springer
We present a review of the methods behind the top 40 highest accuracies achieved on the
ILSVRC 2012 Imagenet validation set as ranked on Papers with Code. A significant …