MJ Hill, S Hengchen - Digital Scholarship in the Humanities, 2019 - academic.oup.com
This article aims to quantify the impact optical character recognition (OCR) has on the quantitative analysis of historical documents. Using Eighteenth Century Collections Online …
S Coats - Language and linguistics in a complex world, 2023 - degruyter.com
This paper introduces two new large corpora comprised of YouTube Automatic Speech Recognition (ASR) transcripts of the speech of videos from geographically localized …
A great deal of historical corpora suffer from errors introduced by the OCR (optical character recognition) methods used in the digitization process. Correcting these errors manually is a …
Historical corpora are known to contain errors introduced by OCR (optical character recognition) methods used in the digitization process, often said to be degrading the …
H Dutta, A Gupta - Decision Support Systems, 2022 - Elsevier
Text databases have grown tremendously in number, size, and volume over the last few decades. Optical Character Recognition (OCR) software is used to scan the text and make …
The aim of this study was to verify the possibility of Sor Juana Inés de la Cruz authoring the anonymous part of the baroque play La Segunda Celestina, commissioned to Agustín de …
Stylometric analysis of medieval vernacular texts is still a significant challenge: the importance of scribal variation, be it spelling or more substantial, as well as the variants and …
Methods and techniques of feature selection support expert domain knowledge in the search for attributes, which are the most important for a task. These approaches can also be …
U Stańczyk, B Zielosko - Bulletin of the Polish Academy of …, 2021 - yadda.icm.edu.pl
When patterns to be recognised are described by features of continuous type, discretisation becomes either an optional or necessary step in the initial data pre-processing stage …