scene text recognition vision language- 学术资源搜索

PMMN: pre-trained multi-modal network for scene text recognition

Y Zhang, Z Fu, F Huang, Y Liu - Pattern Recognition Letters, 2021 - Elsevier

… model and language model respectively to learn modality-specific knowledge for … scene
text recognition. In detail, we first pre-train the proposed off-the-shelf vision model and language …

被引用次数：11 相关文章所有 3 个版本

[PDF] thecvf.com

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H Xie, Y Wang, Z Mao… - … and pattern recognition, 2021 - openaccess.thecvf.com

… ; and 3) language model with noise … scene text recognition. Firstly, the autonomous suggests
to block gradient flow between vision and language models to enforce explicitly language …

被引用次数：332 相关文章所有 6 个版本

[PDF] arxiv.org

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org

… model for feature extraction and a sequence model for text … a Single Visual model for Scene
Text recognition within the … : A new scene text recognizer with visual language modeling …

被引用次数：152 相关文章所有 5 个版本

[PDF] researchgate.net

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer

… For example, instances of scene text can be in different languages, colors, fonts, sizes, … that
scene text detection can be taxonomically subsumed under general object detection, which is …

被引用次数：487 相关文章所有 8 个版本

[PDF] arxiv.org

Behind the scene: Revealing the secrets of pre-trained vision-and-language models

J Cao, Z Gan, Y Cheng, L Yu, YC Chen… - … Vision–ECCV 2020: 16th …, 2020 - Springer

… -trained models have revolutionized vision-and-language (V+L) … behind the scene, we present
Value (Vision-And-Language … , Visual Coreference Resolution, Visual Relation Detection) …

被引用次数：147 相关文章所有 5 个版本

[PDF] arxiv.org

Visual-semantic transformer for scene text recognition

X Tang, Y Lai, Y Liu, Y Fu, R Fang - arXiv preprint arXiv:2112.00948, 2021 - arxiv.org

… We design weight-sharing visual-semantic alignment modules to explicitly enforce the …
without external language models. • We introduce an interaction module that allows visual and …

被引用次数：16 相关文章所有 2 个版本

[PDF] thecvf.com

Dictionary-guided scene text recognition

N Nguyen, T Nguyen, V Tran, MT Tran… - … Recognition, 2021 - openaccess.thecvf.com

… language prior is a potential approach to advance scene text … Moreover, many languages
have special symbols that have … of the current scene text recognition pipeline by introducing a …

被引用次数：65 相关文章所有 4 个版本

[PDF] arxiv.org

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

S Zhao, R Quan, L Zhu, Y Yang - arXiv preprint arXiv:2305.14014, 2023 - arxiv.org

… Abstract—Pre-trained vision-language models (VLMs) are the de-facto … ever, scene text
recognition methods still prefer backbones pretrained on a single modality, namely, the visual …

被引用次数：14 相关文章所有 2 个版本

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

… iterative language modeling for scene text recognition. In: Proceedings of the IEEE/CVF
Conference on Computer Vision and Pattern Recognition (CVPR), pp. 7098–7107, June 2021 …

被引用次数：139 相关文章所有 6 个版本

[PDF] hal.science

Top-down and bottom-up cues for scene text recognition

A Mishra, K Alahari, CV Jawahar - … and pattern recognition, 2012 - ieeexplore.ieee.org

… impressive scene text recognition results using similarity constraints and language statistics,
… In contrast, we show results on a more challenging street view dataset [29], where the words …

被引用次数：451 相关文章所有 23 个版本

高级搜索

QQ 群

PMMN: pre-trained multi-modal network for scene text recognition

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

Svtr: Scene text recognition with a single visual model

Scene text detection and recognition: The deep learning era

Behind the scene: Revealing the secrets of pre-trained vision-and-language models

Visual-semantic transformer for scene text recognition

Dictionary-guided scene text recognition

CLIP4STR: A simple baseline for scene text recognition with pre-trained vision-language model

Scene text recognition with permuted autoregressive sequence models

Top-down and bottom-up cues for scene text recognition

引用