A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

[PDF][PDF] Wrangling with Non-Standard Data.

E Mäkelä, K Lagus, L Lahti, T Säily, M Tolonen… - DHN, 2020 - researchportal.helsinki.fi
Research in the digital humanities and computational social sciences requires overcoming
complexity in research data, methodology, and research questions. In this article, we show …

From the paft to the fiiture: a fully automatic NMT and word embeddings method for OCR post-correction

M Hämäläinen, S Hengchen - arXiv preprint arXiv:1910.05535, 2019 - arxiv.org
A great deal of historical corpora suffer from errors introduced by the OCR (optical character
recognition) methods used in the digitization process. Correcting these errors manually is a …

Automatic normalisation of early Modern French

R Bawden, J Poinhos, E Kogkitsidou… - Proceedings of the …, 2022 - aclanthology.org
Spelling normalisation is a useful step in the study and analysis of historical language texts,
whether it is manual analysis by experts or automatic analysis using downstream natural …

A survey of orthographic information in machine translation

BR Chakravarthi, P Rani, M Arcan, JP McCrae - SN computer science, 2021 - Springer
Abstract Machine translation is one of the applications of natural language processing which
has been explored in different languages. Recently researchers started paying attention …

[PDF][PDF] Dialect text normalization to normative standard Finnish

N Partanen, M Hämäläinen… - Workshop on Noisy …, 2019 - researchportal.helsinki.fi
We compare different LSTMs and transformer models in terms of their effectiveness in
normalizing dialectal Finnish into the normative standard Finnish. As dialect is the common …

Murreviikko-a dialectologically annotated and normalized dataset of Finnish tweets

O Kuparinen - Tenth Workshop on NLP for Similar Languages …, 2023 - aclanthology.org
This paper presents Murreviikko, a dataset of dialectal Finnish tweets which have been
dialectologically annotated and manually normalized to a standard form. The dataset can be …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

LL (O) D and NLP perspectives on semantic change for Humanities research

F Armaselu, ES Apostol, AF Khan… - Semantic …, 2022 - content.iospress.com
This paper presents an overview of the LL (O) D and NLP methods, tools and data for
detecting and representing semantic change, with its main application in humanities …

Lemmatization of historical old literary Finnish texts in modern orthography

M Hämäläinen, N Partanen, K Alnajjar - arXiv preprint arXiv:2107.03266, 2021 - arxiv.org
Texts written in Old Literary Finnish represent the first literary work ever written in Finnish
starting from the 16th century. There have been several projects in Finland that have …