Self-supervised learning for videos: A survey

MC Schiappa, YS Rawat, M Shah - ACM Computing Surveys, 2023 - dl.acm.org
The remarkable success of deep learning in various domains relies on the availability of
large-scale annotated datasets. However, obtaining annotations is expensive and requires …

Scene text detection and recognition: The deep learning era

S Long, X He, C Yao - International Journal of Computer Vision, 2021 - Springer
With the rise and development of deep learning, computer vision has been tremendously
transformed and reshaped. As an important research area in computer vision, scene text …

Redet: A rotation-equivariant detector for aerial object detection

J Han, J Ding, N Xue, GS Xia - Proceedings of the IEEE …, 2021 - openaccess.thecvf.com
Recently, object detection in aerial images has gained much attention in computer vision.
Different from objects in natural images, aerial objects are often distributed with arbitrary …

Trocr: Transformer-based optical character recognition with pre-trained models

M Li, T Lv, J Chen, L Cui, Y Lu, D Florencio… - Proceedings of the …, 2023 - ojs.aaai.org
Text recognition is a long-standing research problem for document digitalization. Existing
approaches are usually built based on CNN for image understanding and RNN for char …

Ocr-free document understanding transformer

G Kim, T Hong, M Yim, JY Nam, J Park, J Yim… - … on Computer Vision, 2022 - Springer
Understanding document images (eg, invoices) is a core but challenging task since it
requires complex functions such as reading text and a holistic understanding of the …

Scene text recognition with permuted autoregressive sequence models

D Bautista, R Atienza - European conference on computer vision, 2022 - Springer
Context-aware STR methods typically use internal autoregressive (AR) language models
(LM). Inherent limitations of AR models motivated two-stage methods which employ an …

Read like humans: Autonomous, bidirectional and iterative language modeling for scene text recognition

S Fang, H Xie, Y Wang, Z Mao… - Proceedings of the …, 2021 - openaccess.thecvf.com
Linguistic knowledge is of great benefit to scene text recognition. However, how to effectively
model linguistic rules in end-to-end deep networks remains a research challenge. In this …

Learning RoI transformer for oriented object detection in aerial images

J Ding, N Xue, Y Long, GS Xia… - Proceedings of the IEEE …, 2019 - openaccess.thecvf.com
Object detection in aerial images is an active yet challenging task in computer vision
because of the bird's-eye view perspective, the highly complex backgrounds, and the variant …

What is wrong with scene text recognition model comparisons? dataset and model analysis

J Baek, G Kim, J Lee, S Park, D Han… - Proceedings of the …, 2019 - openaccess.thecvf.com
Many new proposals for scene text recognition (STR) models have been introduced in
recent years. While each claim to have pushed the boundary of the technology, a holistic …

Towards accurate scene text recognition with semantic reasoning networks

D Yu, X Li, C Zhang, T Liu, J Han… - Proceedings of the …, 2020 - openaccess.thecvf.com
Scene text image contains two levels of contents: visual texture and semantic information.
Although the previous scene text recognition methods have made great progress over the …