Scene text recognition using higher order language priors

J Kaur, W Singh - Multimedia Tools and Applications, 2022 - Springer

Object detection is one of the most fundamental and challenging tasks to locate objects in
images and videos. Over the past, it has gained much attention to do more research on …

被引用次数：95 相关文章所有 8 个版本

[PDF] researchgate.net

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer

With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

被引用次数：479 相关文章所有 8 个版本

[PDF] neurips.cc

Obelics: An open web-scale filtered dataset of interleaved image-text documents

H Laurençon, L Saulnier, L Tronchon… - Advances in …, 2024 - proceedings.neurips.cc

Large multimodal models trained on natural documents, which interleave images and text,
outperform models trained on image-text pairs on various multimodal benchmarks …

被引用次数：123 相关文章所有 5 个版本

[PDF] arxiv.org

Git: A generative image-to-text transformer for vision and language

J Wang, Z Yang, X Hu, L Li, K Lin, Z Gan, Z Liu… - arXiv preprint arXiv …, 2022 - arxiv.org

In this paper, we design and train a Generative Image-to-text Transformer, GIT, to unify
vision-language tasks such as image/video captioning and question answering. While …

被引用次数：420 相关文章所有 4 个版本

[PDF] arxiv.org

Vision-language models for vision tasks: A survey

J Zhang, J Huang, S Jin, S Lu - IEEE Transactions on Pattern …, 2024 - ieeexplore.ieee.org

Most visual recognition studies rely heavily on crowd-labelled data in deep neural networks
(DNNs) training, and they usually train a DNN for each single visual recognition task …

被引用次数：155 相关文章所有 9 个版本

[PDF] arxiv.org

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer

Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

被引用次数：131 相关文章所有 6 个版本

[PDF] thecvf.com

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H Xie, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com

Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

被引用次数：314 相关文章所有 6 个版本

[PDF] thecvf.com

SEED-Bench: Benchmarking Multimodal Large Language Models

B Li, Y Ge, Y Ge, G Wang, R Wang… - Proceedings of the …, 2024 - openaccess.thecvf.com

Multimodal large language models (MLLMs) building upon the foundation of powerful large
language models (LLMs) have recently demonstrated exceptional capabilities in generating …

被引用次数：29 相关文章所有 3 个版本

[PDF] arxiv.org

Svtr: Scene text recognition with a single visual model

Y Du, Z Chen, C Jia, X Yin, T Zheng, C Li, Y Du… - arXiv preprint arXiv …, 2022 - arxiv.org

Dominant scene text recognition models commonly contain two building blocks, a visual
model for feature extraction and a sequence model for text transcription. This hybrid …

被引用次数：140 相关文章所有 5 个版本

[PDF] thecvf.com

From two to one: A new scene text recognizer with visual language modeling network

Y Wang, H Xie, S Fang, J Wang… - Proceedings of the …, 2021 - openaccess.thecvf.com

In this paper, we abandon the dominant complex language model and rethink the linguistic
learning process in the scene text recognition. Different from previous methods considering …

被引用次数：146 相关文章所有 6 个版本

高级搜索

QQ 群