Beyond OCR+ VQA: Towards end-to-end reading and reasoning for robust and accurate textvqa

J Kuang, W Hua, D Liang, M Yang, D Jiang… - … on Document Analysis …, 2023 - Springer

Visual information extraction (VIE), which aims to simultaneously perform OCR and
information extraction in a unified framework, has drawn increasing attention due to its …

被引用次数：15 相关文章所有 4 个版本

[PDF] acm.org

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org

Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

被引用次数：6 相关文章所有 4 个版本

[PDF] ijcai.org

[PDF][PDF] Divide Rows and Conquer Cells: Towards Structure Recognition for Large Tables.

H Shen, X Gao, J Wei, L Qiao, Y Zhou, Q Li, Z Cheng - IJCAI, 2023 - ijcai.org

Abstract Recent advanced Table Structure Recognition (TSR) models adopt image-to-text
solutions to parse table structure. These methods can be formulated as image caption …

被引用次数：7 相关文章所有 2 个版本

[PDF] acm.org

Filling in the blank: Rationale-augmented prompt tuning for TextVQA

G Zeng, Y Zhang, Y Zhou, B Fang, G Zhao… - Proceedings of the 31st …, 2023 - dl.acm.org

Recently, generative Text-based visual question answering (TextVQA) methods, which are
often based on language models, have exhibited impressive results and drawn increasing …

被引用次数：4 相关文章

Prompting large language model with context and pre-answer for knowledge-based VQA

Z Hu, P Yang, Y Jiang, Z Bai - Pattern Recognition, 2024 - Elsevier

Abstract Existing studies apply Large Language Model (LLM) to knowledge-based Visual
Question Answering (VQA) with encouraging results. Due to the insufficient input …

被引用次数：2 相关文章所有 2 个版本

So many heads, so many Wits: Multimodal graph reasoning for text-based visual question answering

W Zheng, L Yan, FY Wang - IEEE Transactions on Systems …, 2023 - ieeexplore.ieee.org

While texts related to images convey fundamental messages for scene understanding and
reasoning, text-based visual question answering tasks concentrate on visual questions that …

被引用次数：1 相关文章

被引用次数：1 相关文章所有 3 个版本

EI²SR: Learning an Enhanced Intra-Instance Semantic Relationship for Arbitrary-Shaped Scene Text Detection

Y Shu, S Liu, Y Zhou, H Xu… - ICASSP 2023-2023 IEEE …, 2023 - ieeexplore.ieee.org

Text detection in natural scenarios, has made significant progress with the deep learning
architecture. Towards arbitrary-shaped text detection, fracture detection is the major concern …

被引用次数：2 相关文章

高级搜索

QQ 群