Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Text detection, recognition, and script identification in natural scene images: A Review

V Naosekpam, N Sahu - International Journal of Multimedia Information …, 2022 - Springer
Text in natural scene images plays a vital role in scene understanding. It contains a rich and
abundant amount of valuable semantic information useful in many applications such as …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org
Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …

Multi-granularity prediction for scene text recognition

P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer
Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …

Dan: a segmentation-free document attention network for handwritten document recognition

D Coquenet, C Chatelain… - IEEE transactions on …, 2023 - ieeexplore.ieee.org
Unconstrained handwritten text recognition is a challenging computer vision task. It is
traditionally handled by a two-step approach, combining line segmentation followed by text …

Multi-modal text recognition networks: Interactive enhancements between visual and semantic features

B Na, Y Kim, S Park - European Conference on Computer Vision, 2022 - Springer
Linguistic knowledge has brought great benefits to scene text recognition by providing
semantics to refine character sequences. However, since linguistic knowledge has been …

Levenshtein ocr

C Da, P Wang, C Yao - European Conference on Computer Vision, 2022 - Springer
A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented.
Inspired by Levenshtein Transformer in the area of NLP, the proposed method (named …

LISTER: Neighbor decoding for length-insensitive scene text recognition

C Cheng, P Wang, C Da, Q Zheng… - Proceedings of the …, 2023 - openaccess.thecvf.com
The diversity in length constitutes a significant characteristic of text. Due to the long-tail
distribution of text lengths, most existing methods for scene text recognition (STR) only work …

Dtrocr: Decoder-only transformer for optical character recognition

M Fujitake - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder
extracts features from an image, and the decoder produces recognized text from these …