Document understanding dataset and evaluation (dude)

J Van Landeghem, R Tito… - Proceedings of the …, 2023 - openaccess.thecvf.com
We call on the Document AI (DocAI) community to re-evaluate current methodologies and
embrace the challenge of creating more practically-oriented benchmarks. Document …

Docpedia: Unleashing the power of large multimodal model in the frequency domain for versatile document understanding

H Feng, Q Liu, H Liu, W Zhou, H Li, C Huang - arXiv preprint arXiv …, 2023 - arxiv.org
This work presents DocPedia, a novel large multimodal model (LMM) for versatile OCR-free
document understanding, capable of parsing images up to 2,560$\times $2,560 resolution …

Screenai: A vision-language model for ui and infographics understanding

G Baechler, S Sunkara, M Wang, F Zubach… - arXiv preprint arXiv …, 2024 - arxiv.org
Screen user interfaces (UIs) and infographics, sharing similar visual language and design
principles, play important roles in human communication and human-machine interaction …

Prompting large language model with context and pre-answer for knowledge-based VQA

Z Hu, P Yang, Y Jiang, Z Bai - Pattern Recognition, 2024 - Elsevier
Abstract Existing studies apply Large Language Model (LLM) to knowledge-based Visual
Question Answering (VQA) with encouraging results. Due to the insufficient input …

Privacy-aware document visual question answering

R Tito, K Nguyen, M Tobaben, R Kerkouche… - arXiv preprint arXiv …, 2023 - arxiv.org
Document Visual Question Answering (DocVQA) is a fast growing branch of document
understanding. Despite the fact that documents contain sensitive or copyrighted information …

Layout and task aware instruction prompt for zero-shot document image question answering

W Wang, Y Li, Y Ou, Y Zhang - arXiv preprint arXiv:2306.00526, 2023 - arxiv.org
Layout-aware pre-trained models has achieved significant progress on document image
question answering. They introduce extra learnable modules into existing language models …

Visually-Rich Document Understanding: Concepts, Taxonomy and Challenges

A Sassioui, R Benouini, Y El Ouargui… - … Networks and Mobile …, 2023 - ieeexplore.ieee.org
The increasing prevalence of Visually-rich Documents (VRDs) in diverse domains has led to
a growing interest in Visually-rich Document Understanding (VrDU). Researchers have …

Selfdocseg: A self-supervised vision-based approach towards document segmentation

S Maity, S Biswas, S Manna, A Banerjee… - … on Document Analysis …, 2023 - Springer
Document layout analysis is a known problem to the documents research community and
has been vastly explored yielding a multitude of solutions ranging from text mining, and …

Beyond Document Page Classification: Design, Datasets, and Challenges

J Van Landeghem, S Biswas… - Proceedings of the …, 2024 - openaccess.thecvf.com
This paper highlights the need to bring document classification benchmarking closer to real-
world applications, both in the nature of data tested (X: multi-channel, multi-paged, multi …

CMA: A Chromaticity Map Adapter for Robust Detection of Screen-Recapture Document Images

C Chen, L Lin, Y Chen, B Li, J Zeng… - Proceedings of the …, 2024 - openaccess.thecvf.com
The rebroadcasting of screen-recaptured document images introduces a significant risk to
the confidential documents processed in government departments and commercial …