Geolayoutlm: Geometric pre-training for visual information extraction

C Luo, C Cheng, Q Zheng… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Visual information extraction (VIE) plays an important role in Document Intelligence.
Generally, it is divided into two tasks: semantic entity recognition (SER) and relation …

Vision grid transformer for document layout analysis

C Da, C Luo, Q Zheng, C Yao - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Document pre-trained models and grid-based models have proven to be very effective on
various tasks in Document AI. However, for the document layout analysis (DLA) task …

Conditional text image generation with diffusion models

Y Zhu, Z Li, T Wang, M He… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
Current text recognition systems, including those for handwritten scripts and scene text, have
relied heavily on image synthesis and augmentation, since it is difficult to realize real-world …

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H Xie, YG Jiang - International Journal of …, 2024 - Springer
The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

LISTER: Neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

B Zhang, H Xie, Z Gao, Y Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Scene text images contain not only style information (font background) but also content
information (character texture). Different scene text tasks need different information but …

Symmetrical linguistic feature distillation with clip for scene text recognition

Z Wang, H Xie, Y Wang, J Xu, B Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org
In this paper, we explore the potential of the Contrastive Language-Image Pretraining (CLIP)
model in scene text recognition (STR), and establish a novel Symmetrical Linguistic Feature …

A survey of text detection and recognition algorithms based on deep learning technology

XF Wang, ZH He, K Wang, YF Wang, L Zou, ZZ Wu - Neurocomputing, 2023 - Elsevier
Abstract Optical Character Recognition (OCR) poses a crucial challenge within the realm of
computer vision research, as it plays a pivotal role in converting vast amounts of …

Linguistic more: Taking a further step toward efficient and accurate scene text recognition

B Zhang, H Xie, Y Wang, J Xu, Y Zhang - arXiv preprint arXiv:2305.05140, 2023 - arxiv.org
Vision model have gained increasing attention due to their simplicity and efficiency in Scene
Text Recognition (STR) task. However, due to lacking the perception of linguistic knowledge …

Class-Aware Mask-guided feature refinement for scene text recognition

M Yang, B Yang, M Liao, Y Zhu, X Bai - Pattern Recognition, 2024 - Elsevier
Scene text recognition is a rapidly developing field that faces numerous challenges due to
the complexity and diversity of scene text, including complex backgrounds, diverse fonts …