Revisiting scene text recognition: A data perspective

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

被引用次数：34 相关文章所有 3 个版本

[PDF] thecvf.com

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

B Zhang, H Xie, Z Gao, Y Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Scene text images contain not only style information (font background) but also content
information (character texture). Different scene text tasks need different information but …

被引用次数：5 相关文章所有 3 个版本

[PDF] arxiv.org

Unified hallucination detection for multimodal large language models

X Chen, C Wang, Y Xue, N Zhang, X Yang, Q Li… - arXiv preprint arXiv …, 2024 - arxiv.org

Despite significant strides in multimodal tasks, Multimodal Large Language Models (MLLMs)
are plagued by the critical issue of hallucination. The reliable detection of such …

被引用次数：14 相关文章所有 3 个版本

[PDF] arxiv.org

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - arXiv preprint arXiv:2305.14014, 2023 - arxiv.org

Pre-trained vision-language models~(VLMs) are the de-facto foundation models for various
downstream tasks. However, scene text recognition methods still prefer backbones pre …

被引用次数：14 相关文章所有 2 个版本

[PDF] thecvf.com

Bridging the Gap Between End-to-End and Two-Step Text Spotting

M Huang, H Li, Y Liu, X Bai… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Modularity plays a crucial role in the development and maintenance of complex systems.
While end-to-end text spotting efficiently mitigates the issues of error accumulation and sub …

OTE: Exploring Accurate Scene Text Recognition Using One Token

J Xu, Y Wang, H Xie, Y Zhang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

In this paper we propose a novel framework to fully exploit the potential of a single vector for
scene text recognition (STR). Different from previous sequence-to-sequence methods that …

被引用次数：1 相关文章

[PDF] thecvf.com

Multi-modal In-Context Learning Makes an Ego-evolving Scene Text Recognizer

Z Zhao, J Tang, C Lin, B Wu, C Huang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Scene text recognition (STR) in the wild frequently encounters challenges when coping with
domain variations font diversity shape deformations etc. A straightforward solution is …

被引用次数：1 相关文章所有 3 个版本

[PDF] arxiv.org

Self-Supervised Pre-training with Symmetric Superimposition Modeling for Scene Text Recognition

Z Gao, Y Wang, Y Qu, B Zhang, Z Wang, J Xu… - arXiv preprint arXiv …, 2024 - arxiv.org

In text recognition, self-supervised pre-training emerges as a good solution to reduce
dependence on expansive annotated real data. Previous studies primarily focus on local …

被引用次数：1 相关文章所有 2 个版本

[PDF] arxiv.org

Masked and permuted implicit context learning for scene text recognition

X Yang, Z Qiao, J Wei, D Yang… - IEEE Signal Processing …, 2024 - ieeexplore.ieee.org

Scene Text Recognition (STR) is challenging because of various text styles, shapes, and
backgrounds. Although the integration of linguistic information enhances models' …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

TextBlockV2: Towards Precise-Detection-Free Scene Text Spotting with Pre-trained Language Model

J Lyu, J Wei, G Zeng, Z Li, E Xie, W Wang… - arXiv preprint arXiv …, 2024 - arxiv.org

Existing scene text spotters are designed to locate and transcribe texts from images.
However, it is challenging for a spotter to achieve precise detection and recognition of scene …

高级搜索

QQ 群