A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

[图书][B] Quantitative historical linguistics: A corpus framework

GB Jenset, B McGillivray - 2017 - books.google.com
This book is an innovative guide to quantitative, corpus-based research in historical and
diachronic linguistics. Gard B. Jenset and Barbara McGillivray argue that, although historical …

Collaborative authorship in the twelfth century: A stylometric study of Hildegard of Bingen and Guibert of Gembloux

M Kestemont, S Moens… - Digital Scholarship in the …, 2015 - academic.oup.com
Abstract Hildegard of Bingen (1098–1179) is one of the most influential female authors of
the Middle Ages. From the point of view of computational stylistics, the oeuvre attributed to …

Modernizing historical Slovene words with character-based SMT

Y Scherrer, T Erjavec - BSNLP 2013-4th Biennial Workshop on …, 2013 - inria.hal.science
We propose a language-independent word normalization method exemplified on
modernizing historical Slovene words. Our method relies on character-based statistical …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

LL (O) D and NLP perspectives on semantic change for Humanities research

F Armaselu, ES Apostol, AF Khan… - Semantic …, 2022 - content.iospress.com
This paper presents an overview of the LL (O) D and NLP methods, tools and data for
detecting and representing semantic change, with its main application in humanities …

Lemmatization for variation-rich languages using deep learning

M Kestemont, G De Pauw, R van Nie… - Digital Scholarship in …, 2017 - academic.oup.com
In this article, we describe a novel approach to sequence tagging for languages that are rich
in (eg orthographic) surface variation. We focus on lemmatization, a basic step in many …

Lemmatization for ancient languages: Rules or neural networks?

O Dereza - Artificial Intelligence and Natural Language: 7th …, 2018 - Springer
Lemmatisation, which is one of the most important stages of text preprocessing, consists in
grouping the inflected forms of a word together so they can be analysed as a single item …

Modernising historical Slovene words

Y Scherrer, T Erjavec - Natural Language Engineering, 2016 - cambridge.org
We propose a language-independent word normalisation method and exemplify it on
modernising historical Slovene words. Our method relies on character-level statistical …

[PDF][PDF] From old texts to modern spellings: an experiment in automatic normalisation

I Hendrickx, R Marquilhas - Journal for Language Technology and …, 2011 - jlcl.org
We aim to tackle the problem of spelling variations in a corpus of personal Portugese letters
from the 16th to the 20th century. We investigated the extent to which the task of normalising …