This paper presents final results of the Out-Of-Vocabulary 2022 (OOV) challenge. The OOV contest introduces an important aspect that is not commonly studied by Optical Character …
A Aberdam, D Bensaïd, A Golts… - Proceedings of the …, 2023 - openaccess.thecvf.com
Reading text in real-world scenarios often requires understanding the context surrounding it, especially when dealing with poor-quality text. However, current scene text recognizers are …
The advent of vision-language pre-training techniques enhanced substantial progress in the development of models for image captioning. However, these models frequently produce …
R Ganz, M Elad - Proceedings of the IEEE/CVF Winter …, 2024 - openaccess.thecvf.com
Abstract Perceptually Aligned Gradients (PAG) refer to an intriguing property observed in robust image classification models, wherein their input gradients align with human …
Abstract Visual Question Answering (VQA) and Image Captioning (CAP), which are among the most popular vision-language tasks, have analogous scene-text versions that require …
Vision-Language (VL) models have gained significant research focus enabling remarkable advances in multimodal reasoning. These architectures typically comprise a vision encoder …
X Li, X Chen, Z Huang, L Xie, J Chen… - Proceedings of the 31st …, 2023 - dl.acm.org
Pseudo-Labeling based semi-supervised learning has shown promising advantages in Scene Text Recognition (STR). Most of them usually use a pre-trained model to generate …
The 39-volume set, comprising the LNCS books 13661 until 13699, constitutes the refereed proceedings of the 17th European Conference on Computer Vision, ECCV 2022, held in Tel …
The increasing use of transformer-based large language models brings forward the challenge of processing long sequences. In document visual question answering (DocVQA) …