Language matters: A weakly supervised vision-language pre-training approach for scene text detection and spotting

C Xue, W Zhang, Y Hao, S Lu, PHS Torr… - … on Computer Vision, 2022 - Springer
… We present oCLIP that learns better scene text visual representations by feature alignment
with textual information. As shown in Fig. 2, the proposed network first extracts image …

Clipter: Looking at the bigger picture in scene text recognition

A Aberdam, D Bensaïd, A Golts… - … Computer Vision, 2023 - openaccess.thecvf.com
… In particular, we explore a range of vision and vision-language image encoders, pooling
operators, light-to-heavy fusion schemes, and different integration points between word-level …

From two to one: A new scene text recognizer with visual language modeling network

Y Wang, H Xie, S Fang, J Wang… - … on Computer Vision, 2021 - openaccess.thecvf.com
vision model with language capability. Specially, we introduce the text recognition of character
Such operation guides the vision model to use not only the visual texture of characters, but …

Vision-language pre-training for boosting scene text detectors

S Song, J Wan, Z Yang, J Tang… - … Recognition, 2022 - openaccess.thecvf.com
… Recently, vision-language joint representation learning has … adapt vision-language joint
learning for scene text detection, a task … ities: vision and language, since text is the written form of …

Scene text recognition using higher order language priors

A Mishra, K Alahari, CV Jawahar - BMVC-British machine vision …, 2012 - inria.hal.science
… like character detection and recognition we provide annotated character bounding boxes. …
We address a more general problem of scene text recognition, ie recognizing a word without …

Vision transformer for fast and efficient scene text recognition

R Atienza - … conference on document analysis and recognition, 2021 - Springer
Scene text recognition (STR) enables computers to read text in natural scenes such as object
labels, road signs and instructions. STR helps machines perform informed decisions such …

Visual attention models for scene text recognition

SK Ghosh, E Valveny… - … analysis and recognition  …, 2017 - ieeexplore.ieee.org
language modeling outperforms the state-ofthe-art in unconstrained scene text recognition
… In this paper we proposed an LSTM-based visual attention model for scene text recognition. …

Attention and language ensemble for scene text recognition with convolutional sequence modeling

S Fang, H Xie, ZJ Zha, N Sun, J Tan… - Proceedings of the 26th …, 2018 - dl.acm.org
… loss from language aspect, multiple losses from attention and language are accumulated
for … on standard datasets for scene text recognition, including Street View Text, IIIT5K and …

Multi-granularity prediction for scene text recognition

P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer
language information of text. In order to effectively resort to linguistic information for scene text
recognition… in NLP [7] into text recognition method. Subword tokenization algorithms aim to …

PMMN: pre-trained multi-modal network for scene text recognition

Y Zhang, Z Fu, F Huang, Y Liu - Pattern Recognition Letters, 2021 - Elsevier
… model and language model respectively to learn modality-specific knowledge for … scene
text recognition. In detail, we first pre-train the proposed off-the-shelf vision model and language