V Naosekpam, N Sahu - International Journal of Multimedia Information …, 2022 - Springer
Text in natural scene images plays a vital role in scene understanding. It contains a rich and abundant amount of valuable semantic information useful in many applications such as …
D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models (LM). Inherent limitations of AR models motivated two-stage methods which employ an …
Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org
Dominant scene text recognition models commonly contain two building blocks, a visual model for feature extraction and a sequence model for text transcription. This hybrid …
P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer
Scene text recognition (STR) has been an active research topic in computer vision for years. To tackle this challenging problem, numerous innovative methods have been successively …
Unconstrained handwritten text recognition is a challenging computer vision task. It is traditionally handled by a two-step approach, combining line segmentation followed by text …
B Na, Y Kim, S Park - European Conference on Computer Vision, 2022 - Springer
Linguistic knowledge has brought great benefits to scene text recognition by providing semantics to refine character sequences. However, since linguistic knowledge has been …
C Da, P Wang, C Yao - European Conference on Computer Vision, 2022 - Springer
A novel scene text recognizer based on Vision-Language Transformer (VLT) is presented. Inspired by Levenshtein Transformer in the area of NLP, the proposed method (named …
The diversity in length constitutes a significant characteristic of text. Due to the long-tail distribution of text lengths, most existing methods for scene text recognition (STR) only work …
M Fujitake - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Typical text recognition methods rely on an encoder-decoder structure, in which the encoder extracts features from an image, and the decoder produces recognized text from these …