Survey of post-OCR processing approaches

TTH Nguyen, A Jatowt, M Coustaty… - ACM Computing Surveys …, 2021 - dl.acm.org
Optical character recognition (OCR) is one of the most popular techniques used for
converting printed documents into machine-readable ones. While OCR engines can do well …

[HTML][HTML] Optical character recognition with neural networks and post-correction with finite state methods

S Drobac, K Lindén - International Journal on Document Analysis and …, 2020 - Springer
The optical character recognition (OCR) quality of the historical part of the Finnish
newspaper and journal corpus is rather low for reliable search and scientific research on the …

AI-PoCoTo: Combining automated and interactive ocr postcorrection

T Englmeier, F Fink, KU Schulz - … of the 3rd International Conference on …, 2019 - dl.acm.org
PoCoTo is known as a web-based interactive tool for the postcorrection of OCR-results on
historical texts. In this paper we first introduce A-PoCoTo, a fully automated extension of …

Lima or cima? Structure recognition and OCR in building the corpus of the Austrian Alpine Club Journal

C Posch, G Rampl - International Journal of Corpus Linguistics, 2020 - jbe-platform.com
This paper outlines the construction of the corpus Alpenwort, a large, genre-based corpus of
German texts on alpinism. We report on issues related to building the corpus from the …

[PDF][PDF] OCR and post-correction of historical newspapers and journals

S Drobac - University of Helsinki, 2020 - helda.helsinki.fi
The corpus of historical newspapers and journals published in Finland, with more than 11
million pages of historical text, is of great value to the research community. The National …

Bootstrapped OCR error detection for a less-resourced language variant

A Barbaresi - 13th Conference on Natural Language Processing …, 2016 - hal.science
This study focuses on isolated error detection in a retro-digitized newspaper corpus
published from 1946 to 1990 in the former German Democratic Republic. As there are OCR …

A Comparative Analysis for Optical Character Recognition for Text Extraction from Images Using Artificial Neural Network Fuzzy Inference System.

S Bhyrapuneni, A Rajendran - Traitement du Signal, 2022 - search.ebscohost.com
Artificial neural networks (ANN) has the capability to analyze raw data from processing input-
output relationships. This function considers them important in areas of industry with such …

[PDF][PDF] The OPATCH corpus platform–facing heterogeneous groups of texts and users

V Lyding, M Généreux, K Szabò, J Andresen - CLiC it, 2015 - academia.edu
This paper presents the design and development of the OPATCH1 corpus platform for the
processing and delivery of heterogeneous text collections for different usage scenarios …

Visual Corpus Interface--Putting Text Visualizations at Use

V Lyding, M Généreux - 2016 20th International Conference …, 2016 - ieeexplore.ieee.org
This paper presents the visual corpus interface created within the OPATCH project ('Open
Platform for access to and Analysis of Textual documents of Cultural Heritage'). The interface …

Enhancing Human-Transcribed Records by Using OCR

J Zedlitz, N Luttenberger - … of the 2nd International Conference on Digital …, 2017 - dl.acm.org
Data from highly structured source material (so-called serial sources) is used in a variety of
research areas, such as demography or economic history. When the goal of a transcription …