Visual information extraction in the wild: practical dataset and end-to-end solution

J Kuang, W Hua, D Liang, M Yang, D Jiang… - … on Document Analysis …, 2023 - Springer
Visual information extraction (VIE), which aims to simultaneously perform OCR and
information extraction in a unified framework, has drawn increasing attention due to its …

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org
Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

[PDF][PDF] Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables.

H Shen, X Gao, J Wei, L Qiao, Y Zhou, Q Li, Z Cheng - IJCAI, 2023 - ijcai.org
Abstract Recent advanced Table Structure Recognition (TSR) models adopt image-to-text
solutions to parse table structure. These methods can be formulated as image caption …

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

G Zeng, Y Zhang, Y Zhou, B Fang, G Zhao… - Proceedings of the 31st …, 2023 - dl.acm.org
Recently, generative Text-based visual question answering (TextVQA) methods, which are
often based on language models, have exhibited impressive results and drawn increasing …

Prompting large language model with context and pre-answer for knowledge-based VQA

Z Hu, P Yang, Y Jiang, Z Bai - Pattern Recognition, 2024 - Elsevier
Abstract Existing studies apply Large Language Model (LLM) to knowledge-based Visual
Question Answering (VQA) with encouraging results. Due to the insufficient input …

So many heads, so many Wits: Multimodal graph reasoning for text-based visual question answering

W Zheng, L Yan, FY Wang - IEEE Transactions on Systems …, 2023 - ieeexplore.ieee.org
While texts related to images convey fundamental messages for scene understanding and
reasoning, text-based visual question answering tasks concentrate on visual questions that …

Reading order detection in visually-rich documents with multi-modal layout-aware relation prediction

L Qiao, C Li, Z Cheng, Y Xu, Y Niu, X Li - Pattern Recognition, 2024 - Elsevier
Reading order detection aims to arrange the text logically, which is essential in
understanding visual documents. Current methods mostly model the problem as a sequence …

Relation-Aware Heterogeneous Graph Network for Learning Intermodal Semantics in Textbook Question Answering

S Zhang, Y Wu, X Zhang, Z Feng… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org
Textbook question answering (TQA) task aims to infer answers for given questions from a
multimodal context, including text and diagrams. The existing studies have aggregated …

Mask-Guided Stamp Erasure for Real Document Image

X Yang, D Yang, Y Zhou, Y Guo… - 2023 IEEE International …, 2023 - ieeexplore.ieee.org
The application of text recognition in the automatic analysis of invoices, contracts and other
documents has significantly raised office efficiency, but the stamps overlapping with the texts …

EI2SR: Learning an Enhanced Intra-Instance Semantic Relationship for Arbitrary-Shaped Scene Text Detection

Y Shu, S Liu, Y Zhou, H Xu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org
Text detection in natural scenarios, has made significant progress with the deep learning
architecture. Towards arbitrary-shaped text detection, fracture detection is the major concern …