M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many techniques have been proposed, including rule-based methods, distance metrics, character …
Natural-language processing of historical documents is complicated by the abundance of variant spellings and lack of annotated data. A common approach is to normalize the …
Text normalization is the task of mapping non-canonical language, typical of speech transcription and computer-mediated communication, to a standardized writing. It is an up …
In this paper, we apply different NMT models to the problem of historical spelling normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …
The paper presents two manually annotated Slovene language text normalisation datasets, one of historical texts and the other of tweets, and proposes several variants of character …
B Navarro, MR Lafoz, N Sánchez - Proceedings of the Tenth …, 2016 - aclanthology.org
In order to analyze metrical and semantics aspects of poetry in Spanish with computational techniques, we have developed a large corpus annotated with metrical information. In this …
N Korchagina - Proceedings of the NoDaLiDa 2017 Workshop on …, 2017 - aclanthology.org
The application of NLP tools to historical texts is complicated by a high level of spelling variation. Different methods of historical text normalization have been proposed. In this …
Automated processing of historical texts often relies on pre-normalization to modern word forms. Training encoder-decoder architectures to solve such problems typically requires a lot …
Y Scherrer, T Erjavec - BSNLP 2013-4th Biennial Workshop on …, 2013 - inria.hal.science
We propose a language-independent word normalization method exemplified on modernizing historical Slovene words. Our method relies on character-based statistical …