Toward understanding wordart: Corner-guided transformer for scene text recognition

P Xu, W Shao, K Zhang, P Gao, S Liu… - … on Pattern Analysis …, 2024 - ieeexplore.ieee.org

Large Vision-Language Models (LVLMs) have recently played a dominant role in
multimodal vision-language learning. Despite the great success, it lacks a holistic evaluation …

被引用次数：177 相关文章所有 3 个版本

[PDF] nowpublishers.com

Multimodal foundation models: From specialists to general-purpose assistants

C Li, Z Gan, Z Yang, J Yang, L Li… - … and Trends® in …, 2024 - nowpublishers.com

Neural compression is the application of neural networks and other machine learning
methods to data compression. Recent advances in statistical machine learning have opened …

被引用次数：201 相关文章所有 6 个版本

[PDF] arxiv.org

On the hidden mystery of ocr in large multimodal models

Y Liu, Z Li, B Yang, C Li, X Yin, C Liu, L Jin… - arXiv preprint arXiv …, 2023 - arxiv.org

Large models have recently played a dominant role in natural language processing and
multimodal vision-language learning. However, their effectiveness in text-related visual …

被引用次数：159 相关文章所有 2 个版本

[PDF] thecvf.com

Revisiting scene text recognition: A data perspective

Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com

This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective.
We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …

被引用次数：41 相关文章所有 5 个版本

[PDF] arxiv.org

Nvlm: Open frontier-class multimodal llms

W Dai, N Lee, B Wang, Z Yang, Z Liu, J Barker… - arXiv preprint arXiv …, 2024 - arxiv.org

We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs)
that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …

被引用次数：17 相关文章所有 4 个版本

[PDF] arxiv.org

Exploring ocr capabilities of gpt-4v (ision): A quantitative and in-depth evaluation

Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org

This paper presents a comprehensive evaluation of the Optical Character Recognition
(OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …

被引用次数：51 相关文章所有 3 个版本

[PDF] arxiv.org

One-dm: One-shot diffusion mimicker for handwritten text generation

G Dai, Y Zhang, Q Ke, Q Guo, S Huang - European Conference on …, 2025 - Springer

Existing handwritten text generation methods often require more than ten handwriting
samples as style references. However, in practical applications, users tend to prefer a …

被引用次数：9 相关文章所有 9 个版本

[PDF] arxiv.org

Cdistnet: Perceiving multi-domain character distance for robust text recognition

T Zheng, Z Chen, S Fang, H Xie, YG Jiang - International Journal of …, 2024 - Springer

The transformer-based encoder-decoder framework is becoming popular in scene text
recognition, largely because it naturally integrates recognition clues from both visual and …

被引用次数：65 相关文章所有 4 个版本

[PDF] thecvf.com

Choose What You Need: Disentangled Representation Learning for Scene Text Recognition Removal and Editing

B Zhang, H Xie, Z Gao, Y Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

Scene text images contain not only style information (font background) but also content
information (character texture). Different scene text tasks need different information but …

被引用次数：9 相关文章所有 3 个版本

[PDF] thecvf.com

Foreground and text-lines aware document image rectification

H Li, X Wu, Q Chen, Q Xiang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com

This paper aims at the distorted document image rectification problem, the objective to
eliminate the geometric distortion in the document images and realize document …

被引用次数：7 相关文章所有 3 个版本

高级搜索

QQ 群