Vision-language pre-training for boosting scene text detectors

W Yu, Y Liu, W Hua, D Jiang… - Proceedings of the …, 2023 - openaccess.thecvf.com

The recent large-scale Contrastive Language-Image Pretraining (CLIP) model has shown
great potential in various downstream tasks via leveraging the pretrained vision and …

被引用次数：49 相关文章所有 7 个版本

[PDF] acm.org

Towards robust real-time scene text detection: From semantic to instance representation learning

X Qin, P Lyu, C Zhang, Y Zhou, K Yao… - Proceedings of the 31st …, 2023 - dl.acm.org

Due to the flexible representation of arbitrary-shaped scene text and simple pipeline, bottom-
up segmentation-based methods begin to be mainstream in real-time scene text detection …

被引用次数：7 相关文章所有 4 个版本

[PDF] thecvf.com

OmniParser: A Unified Framework for Text Spotting Key Information Extraction and Table Recognition

J Wan, S Song, W Yu, Y Liu, W Cheng… - Proceedings of the …, 2024 - openaccess.thecvf.com

Recently visually-situated text parsing (VsTP) has experienced notable advancements
driven by the increasing demand for automated document understanding and the …

被引用次数：3 相关文章所有 3 个版本

[PDF] arxiv.org

Turning a clip model into a scene text spotter

W Yu, Y Liu, X Zhu, H Cao, X Sun… - IEEE Transactions on …, 2024 - ieeexplore.ieee.org

We exploit the potential of the large-scale Contrastive Language-Image Pretraining (CLIP)
model to enhance scene text detection and spotting tasks, transforming it into a robust …

被引用次数：4 相关文章所有 6 个版本

[PDF] thecvf.com

Modeling entities as semantic points for visual information extraction in the wild

Z Yang, R Long, P Wang, S Song… - Proceedings of the …, 2023 - openaccess.thecvf.com

Abstract Recently, Visual Information Extraction (VIE) has been becoming increasingly
important in both academia and industry, due to the wide range of real-world applications …

被引用次数：6 相关文章所有 5 个版本

[PDF] acm.org

Perceiving ambiguity and semantics without recognition: an efficient and effective ambiguous scene text detector

Y Shu, W Wang, Y Zhou, S Liu, A Zhang… - Proceedings of the 31st …, 2023 - dl.acm.org

Ambiguous scene text detection is an extremely challenging task. Existing text detectors that
rely solely on visual cues often suffer from confusion due to being evenly distributed in …

被引用次数：5 相关文章

[PDF] arxiv.org

Less is more: Removing text-regions improves clip training efficiency and robustness

L Cao, B Zhang, C Chen, Y Yang, X Du… - arXiv preprint arXiv …, 2023 - arxiv.org

The CLIP (Contrastive Language-Image Pre-training) model and its variants are becoming
the de facto backbone in many applications. However, training a CLIP model from hundreds …

被引用次数：13 相关文章所有 2 个版本

[PDF] archive.org

Masked Text Modeling: A Self-Supervised Pre-training Method for Scene Text Detection

K Wang, H Xie, Y Wang, D Zhang, Y Qu, Z Gao… - Proceedings of the 31st …, 2023 - dl.acm.org

Scene text detection has made great progress recently with the wide use of pre-training.
Nonetheless, existing scene text detection methods still suffer from two problems: 1) Limited …

被引用次数：4 相关文章所有 2 个版本

[PDF] thecvf.com

ODM: A Text-Image Further Alignment Pre-training Approach for Scene Text Detection and Spotting

C Duan, P Fu, S Guo, Q Jiang… - Proceedings of the IEEE …, 2024 - openaccess.thecvf.com

In recent years text-image joint pre-training techniques have shown promising results in
various tasks. However in Optical Character Recognition (OCR) tasks aligning text instances …

[HTML][HTML] Evaluating synthetic pre-Training for handwriting processing tasks

V Pippi, S Cascianelli, L Baraldi, R Cucchiara - Pattern Recognition Letters, 2023 - Elsevier

In this work, we explore massive pre-training on synthetic word images for enhancing the
performance on four benchmark downstream handwriting analysis tasks. To this end, we …

被引用次数：5 相关文章所有 7 个版本

高级搜索

QQ 群