… modUle (EMU) for scenetextrecognition in the scenario of multi-languages or languages with large character set. Specifically, EMU … , object segmentation and visionlanguage model. …
C Xue, J Huang, W Zhang, S Lu, C Wang… - arXiv preprint arXiv …, 2021 - arxiv.org
… of natural language processing, most recent scenetext recognizers adopt an … of visual features at noisy decoding time steps. This paper presents I2C2W, a novel scenetextrecognition …
… and iterative language modeling for scenetextrecognition. In Proceedings of the IEEE/CVF Conference on Computer Vision and Pattern Recognition, pages 7098–7107, 2021. 1, 6 …
M Rang, Z Bi, C Liu, Y Wang… - … and Pattern Recognition, 2024 - openaccess.thecvf.com
… narrows its focus to the textrecognition phase, specifically to SceneTextRecognition (STR). STR … Scenetextrecognition using higher order language priors. BMVC-British machine …
M Bušta, Y Patel, J Matas - … : 14th Asian Conference on Computer Vision …, 2019 - Springer
… Scenetextrecognition finds its use as a component in larger … driving, indoor navigations and visual search engines. … training multi-languagescenetextdetection, recognition and script …
X Cheng, W Zhou, X Li, X Chen, J Yang, T Li… - arXiv preprint arXiv …, 2024 - arxiv.org
… that the single-vision model based on the self-attention mechanism can still achieve comparable accuracy to the high-level vision-language model in the scenetextrecognition task. At …
H Xie, S Fang, ZJ Zha, Y Yang, Y Li… - ACM Transactions on …, 2019 - dl.acm.org
… on standard datasets for scenetextrecognition, including Street ViewText, IIIT5K, and ICDAR … this article, we show that convolutional-based language modeling for textrecognition not …
… [13] propose to ensemble attention and language models in an attention-based architecture. … Images are generated from side-view angle snapshots in Google Street View. Therefore, …
Y Zhu, Z Liu, Y Liang, X Li, H Liu, C Bao… - Proceedings of the AAAI …, 2023 - ojs.aaai.org
… (b) Scenetextrecognition mistakes in the STVQA task. … out the correct scenetext words, we design a language refinement network based on a pre-trained language model to distinguish …