Vision grid transformer for document layout analysis

C Da, C Luo, Q Zheng, C Yao - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Document pre-trained models and grid-based models have proven to be very effective on
various tasks in Document AI. However, for the document layout analysis (DLA) task …

LORE++: Logical location regression network for table structure recognition with pre-training

R Long, H Xing, Z Yang, Q Zheng, Z Yu, F Huang… - Pattern Recognition, 2025 - Elsevier
Table structure recognition (TSR) aims at extracting tables in images into machine-
understandable formats. Current approaches address this issue by either predicting the …

Universal Fine-grained Visual Categorization by Concept Guided Learning

Q Bi, B Zhou, W Ji, GS Xia - IEEE Transactions on Image …, 2025 - ieeexplore.ieee.org
Existing fine-grained visual categorization (FGVC) methods assume that the fine-grained
semantics rest in the informative parts of an image. This assumption works well on favorable …

LDP: Generalizing to Multilingual Visual Information Extraction by Language Decoupled Pretraining

H Shen, G Li, J Zhong, Y Zhou - arXiv preprint arXiv:2412.14596, 2024 - arxiv.org
Visual Information Extraction (VIE) plays a crucial role in the comprehension of semi-
structured documents, and several pre-trained models have been developed to enhance …

HIP: Hierarchical Point Modeling and Pre-training for Visual Information Extraction

R Long, P Wang, Z Yang, C Yao - arXiv preprint arXiv:2411.01139, 2024 - arxiv.org
End-to-end visual information extraction (VIE) aims at integrating the hierarchical subtasks of
VIE, including text spotting, word grouping, and entity labeling, into a unified framework …

ConBGAT: a novel model combining convolutional neural networks, transformer and graph attention network for information extraction from scanned image

DHV Hoang, HV Quoc, BT Hung - PeerJ Computer Science, 2024 - peerj.com
Extracting information from scanned images is a critical task with far-reaching practical
implications. Traditional methods often fall short by inadequately leveraging both image and …

UniVIE: A Unified Label Space Approach to Visual Information Extraction from Form-Like Documents

K Hu, J Wang, W Lin, Z Zhong, L Sun, Q Huo - International Conference on …, 2024 - Springer
Abstract Existing methods for Visual Information Extraction (VIE) from form-like documents
typically fragment the process into separate subtasks, such as key information extraction …

KVP10k: A Comprehensive Dataset for Key-Value Pair Extraction in Business Documents

O Naparstek, O Azulai, I Shapira, E Amrani… - … on Document Analysis …, 2024 - Springer
In recent years, the challenge of extracting information from business documents has
emerged as a critical task, finding applications across numerous domains. This effort has …

ROISER: Towards Real World Semantic Entity Recognition from Visually-Rich Documents

Z Lin, J Wang, W Liao, W Dai, L Xiong, L Jin - International Conference on …, 2025 - Springer
Visual semantic entity recognition (visual SER) aims to extract contents that fall in key fields
from the given visually-rich document image, and it has been widely applied across diverse …

Polar-Doc: One-Stage Document Dewarping with Multi-Scope Constraints under Polar Representation

W Zhang, Q Wang, K Huang - arXiv preprint arXiv:2312.07925, 2023 - arxiv.org
Document dewarping, aiming to eliminate geometric deformation in photographed
documents to benefit text recognition, has made great progress in recent years but is still far …