Neural compression is the application of neural networks and other machine learning methods to data compression. Recent advances in statistical machine learning have opened …
Large models have recently played a dominant role in natural language processing and multimodal vision-language learning. However, their effectiveness in text-related visual …
Q Jiang, J Wang, D Peng, C Liu… - Proceedings of the IEEE …, 2023 - openaccess.thecvf.com
This paper aims to re-assess scene text recognition (STR) from a data-oriented perspective. We begin by revisiting the six commonly used benchmarks in STR and observe a trend of …
We introduce NVLM 1.0, a family of frontier-class multimodal large language models (LLMs) that achieve state-of-the-art results on vision-language tasks, rivaling the leading proprietary …
Y Shi, D Peng, W Liao, Z Lin, X Chen, C Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper presents a comprehensive evaluation of the Optical Character Recognition (OCR) capabilities of the recently released GPT-4V (ision), a Large Multimodal Model …
Existing handwritten text generation methods often require more than ten handwriting samples as style references. However, in practical applications, users tend to prefer a …
The transformer-based encoder-decoder framework is becoming popular in scene text recognition, largely because it naturally integrates recognition clues from both visual and …
B Zhang, H Xie, Z Gao, Y Wang - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com
Scene text images contain not only style information (font background) but also content information (character texture). Different scene text tasks need different information but …
H Li, X Wu, Q Chen, Q Xiang - Proceedings of the IEEE/CVF …, 2023 - openaccess.thecvf.com
This paper aims at the distorted document image rectification problem, the objective to eliminate the geometric distortion in the document images and realize document …