Turning a clip model into a scene text detector

W Yu, Y Liu, W Hua, D Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com
The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown
great potential in various downstream tasks via leveraging the pretrained vision and …

Towards end-to-end unified scene text detection and layout analysis

S Long, S Qin, D Panteleev… - Proceedings of the …, 2022 - openaccess.thecvf.com
Scene text detection and document layout analysis have long been treated as two separate
tasks in different image domains. In this paper, we bring them together and introduce the …

Estextspotter: Towards better scene text spotting with explicit synergy in transformer

M Huang, J Zhang, D Peng, H Lu… - Proceedings of the …, 2023 - openaccess.thecvf.com
In recent years, end-to-end scene text spotting approaches are evolving to the Transformer-
based framework. While previous studies have shown the crucial importance of the intrinsic …

Few could be better than all: Feature sampling and grouping for scene text detection

J Tang, W Zhang, H Liu, MK Yang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, transformer-based methods have achieved promising progresses in object
detection, as they can eliminate the post-processes like NMS and enrich the deep …

Arbitrary shape text detection via boundary transformer

SX Zhang, C Yang, X Zhu… - IEEE Transactions on …, 2023 - ieeexplore.ieee.org
In arbitrary shape text detection, locating accurate text boundaries is challenging and non-
trivial. Existing methods often suffer from indirect text boundary modeling or complex post …

Language matters: A weakly supervised vision-language pre-training approach for scene text detection and spotting

C Xue, W Zhang, Y Hao, S Lu, PHS Torr… - European Conference on …, 2022 - Springer
Abstract Recently, Vision-Language Pre-training (VLP) techniques have greatly benefited
various vision-language tasks by jointly learning visual and textual representations, which …

Textdiffuser-2: Unleashing the power of language models for text rendering

J Chen, Y Huang, T Lv, L Cui, Q Chen, F Wei - arXiv preprint arXiv …, 2023 - arxiv.org
The diffusion model has been proven a powerful generative model in recent years, yet
remains a challenge in generating visual text. Several methods alleviated this issue by …

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org
Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

Vision-language pre-training for boosting scene text detectors

S Song, J Wan, Z Yang, J Tang… - Proceedings of the …, 2022 - openaccess.thecvf.com
Recently, vision-language joint representation learning has proven to be highly effective in
various scenarios. In this paper, we specifically adapt vision-language joint learning for …

A survey of text detection and recognition algorithms based on deep learning technology

XF Wang, ZH He, K Wang, YF Wang, L Zou, ZZ Wu - Neurocomputing, 2023 - Elsevier
Abstract Optical Character Recognition (OCR) poses a crucial challenge within the realm of
computer vision research, as it plays a pivotal role in converting vast amounts of …