PMMN: pre-trained multi-modal network for scene text recognition

Y Zhang, Z Fu, F Huang, Y Liu - Pattern Recognition Letters, 2021 - Elsevier
… model and language model respectively to learn modality-specific knowledge for … scene
text recognition. In detail, we first pre-train the proposed off-the-shelf vision model and language

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H Xie, Y Wang, Z Mao… - … and pattern recognition, 2021 - openaccess.thecvf.com
… ; and 3) language model with noise … scene text recognition. Firstly, the autonomous suggests
to block gradient flow between vision and language models to enforce explicitly language

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org
… model for feature extraction and a sequence model for text … a Single Visual model for Scene
Text recognition within the … : A new scene text recognizer with visual language modeling …

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer
… For example, instances of scene text can be in different languages, colors, fonts, sizes, … that
scene text detection can be taxonomically subsumed under general object detection, which is …

Behind the scene: Revealing the secrets of pre-trained vision-and-language models

J Cao, Z Gan, Y Cheng, L Yu, YC Chen… - … Vision–ECCV 2020: 16th …, 2020 - Springer
… -trained models have revolutionized vision-and-language (V+L) … behind the scene, we present
Value (Vision-And-Language … , Visual Coreference Resolution, Visual Relation Detection) …

Visual-semantic transformer for scene text recognition

X Tang, Y Lai, Y Liu, Y Fu, R Fang - arXiv preprint arXiv:2112.00948, 2021 - arxiv.org
… We design weight-sharing visual-semantic alignment modules to explicitly enforce the …
without external language models. • We introduce an interaction module that allows visual and …

Dictionary-guided scene text recognition

N Nguyen, T Nguyen, V Tran, MT Tran… - … Recognition, 2021 - openaccess.thecvf.com
language prior is a potential approach to advance scene text … Moreover, many languages
have special symbols that have … of the current scene text recognition pipeline by introducing a …

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - arXiv preprint arXiv:2305.14014, 2023 - arxiv.org
… Abstract—Pre-trained vision-language models (VLMs) are the de-facto … ever, scene text
recognition methods still prefer backbones pretrained on a single modality, namely, the visual

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
… iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7098–7107, June 2021 …

Top-down and bottom-up cues for scene text recognition

A Mishra, K Alahari, CV Jawahar - … and pattern recognition, 2012 - ieeexplore.ieee.org
… impressive scene text recognition results using similarity constraints and language statistics,
… In contrast, we show results on a more challenging street view dataset [29], where the words …