Optical character recognition of 19th century classical commentaries: the current state of affairs

M Romanello, S Najem-Meyer… - Proceedings of the 6th …, 2021 - dl.acm.org
Together with critical editions and translations, commentaries are one of the main genres of
publication in literary and textual scholarship, and have a century-long tradition. Yet, the …

State of the art optical character recognition of 19th century fraktur scripts using open source engines

C Reul, U Springmann, C Wick, F Puppe - arXiv preprint arXiv:1810.03436, 2018 - arxiv.org
In this paper we evaluate Optical Character Recognition (OCR) of 19th century Fraktur
scripts without book-specific training using mixed models, ie models trained to recognize a …

Large-scale optical character recognition of ancient greek

B Robertson, F Boschetti - Mouseion, 2017 - utpjournals.press
This paper documents our campaign to undertake the large-scale optical character
recognition of ancient, or polytonic, Greek. Building upon the Gamera OCR engine and …

OCR4all—An open-source tool providing a (semi-) automatic OCR workflow for historical printings

C Reul, D Christ, A Hartelt, N Balbach, M Wehner… - Applied Sciences, 2019 - mdpi.com
Optical Character Recognition (OCR) on historical printings is a challenging task mainly due
to the complexity of the layout and the highly variant typography. Nevertheless, in the last …

OCR of historical printings with an application to building diachronic corpora: A case study using the RIDGES herbal corpus

U Springmann, A Lüdeling - arXiv preprint arXiv:1608.02153, 2016 - arxiv.org
This article describes the results of a case study that applies Neural Network-based Optical
Character Recognition (OCR) to scanned images of books printed between 1487 and 1870 …

[PDF][PDF] How to improve optical character recognition of historical Finnish newspapers using open source Tesseract OCR engine

M Koistinen, K Kettunen, J Kervinen - Proc. of LTC, 2017 - researchgate.net
The current paper presents work that has been carried out in the National Library of Finland
(NLF) to improve optical character recognition (OCR) quality of the historical Finnish …

How to improve optical character recognition of historical Finnish newspapers using open source tesseract OCR engine–final notes on development and evaluation

M Koistinen, K Kettunen, J Kervinen - Human Language Technology …, 2020 - Springer
The current paper presents work that has been carried out in the National Library of Finland
(NLF) to improve optical character recognition (OCR) quality of the historical Finnish …

Optical character recognition (ocr) and medieval manuscripts: Reconsidering transcriptions in the digital age

J Schoen, GE Saretto - Digital Philology: A Journal of Medieval …, 2022 - muse.jhu.edu
This essay will discuss an ongoing project to train an optical character recognition (OCR)
system on medieval manuscripts—specifically, the OCR engine Kraken, which we trained to …

Reading in the mist: high-quality optical character recognition based on freely available early modern digitized books

A Sangiacomo, H Hogenbirk… - … Scholarship in the …, 2022 - academic.oup.com
In this paper, we present a workflow for reworking digitized versions of early modern books,
freely available in the public domain, in such a way that they will be capable of yielding high …

Mixed model OCR training on historical Latin script for out-of-the-box recognition and finetuning

C Reul, C Wick, M Nöth, A Büttner, M Wehner… - Proceedings of the 6th …, 2021 - dl.acm.org
In order to apply Optical Character Recognition (OCR) to historical printings of Latin script
fully automatically, we report on our efforts to construct a widely-applicable polyfont …