Lvlm-ehub: A comprehensive evaluation benchmark for large vision-language models

P Xu, W Shao, K Zhang, P Gao, S Liu, M Lei… - arXiv preprint arXiv …, 2023 - arxiv.org
Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org
Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …

Swintextspotter: Scene text spotting via better synergy between text detection and text recognition

M Huang, Y Liu, Z Peng, C Liu, D Lin… - proceedings of the …, 2022 - openaccess.thecvf.com
End-to-end scene text spotting has attracted great attention in recent years due to the
success of excavating the intrinsic synergy of the scene text detection and recognition …

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L Jin… - arXiv preprint arXiv …, 2023 - arxiv.org
Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

Estextspotter: Towards better scene text spotting with explicit synergy in transformer

M Huang, J Zhang, D Peng, H Lu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-
based framework. While previous studies have shown the crucial importance of the intrinsic …

Multi-granularity prediction for scene text recognition

P Wang, C Da, C Yao - European Conference on Computer Vision, 2022 - Springer
Scene text recognition (STR) has been an active research topic in computer vision for years.
To tackle this challenging problem, numerous innovative methods have been successively …

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

Abinet++: Autonomous, bidirectional and iterative language modeling for scene text spotting

S Fang, Z Mao, H Xie, Y Wang, C Yan… - IEEE transactions on …, 2022 - ieeexplore.ieee.org
Scene text spotting is of great importance to the computer vision community due to its wide
variety of applications. Recent methods attempt to introduce linguistic knowledge for …