Applying the transformer to character-level transduction

S Wu, R Cotterell, M Hulden - arXiv preprint arXiv:2005.10213, 2020 - arxiv.org
The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org
There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

M Bollmann, A Søgaard - arXiv preprint arXiv:1610.07844, 2016 - arxiv.org
Natural-language processing of historical documents is complicated by the abundance of
variant spellings and lack of annotated data. A common approach is to normalize the …

Encoder-decoder methods for text normalization

M Lusetti, T Ruzsics, A Göhring, T Samardžić, E Stark - 2018 - zora.uzh.ch
Text normalization is the task of mapping non-canonical language, typical of speech
transcription and computer-mediated communication, to a standardized writing. It is an up …

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org
In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

[PDF][PDF] Normalising Slovene data: historical texts vs. user-generated content

N Ljubešic, K Zupan, D Fišer, T Erjavec - Proceedings of the 13th …, 2016 - academia.edu
The paper presents two manually annotated Slovene language text normalisation datasets,
one of historical texts and the other of tweets, and proposes several variants of character …

[PDF][PDF] Normalizing medieval german texts: from rules to deep learning

N Korchagina - Proceedings of the NoDaLiDa 2017 Workshop on …, 2017 - aclanthology.org
The application of NLP tools to historical texts is complicated by a high level of spelling
variation. Different methods of historical text normalization have been proposed. In this …

Learning attention for historical text normalization by learning to pronounce

M Bollmann, J Bingel, A Søgaard - … of the 55th Annual Meeting of …, 2017 - aclanthology.org
Automated processing of historical texts often relies on pre-normalization to modern word
forms. Training encoder-decoder architectures to solve such problems typically requires a lot …

Modernizing historical Slovene words with character-based SMT

Y Scherrer, T Erjavec - BSNLP 2013-4th Biennial Workshop on …, 2013 - inria.hal.science
We propose a language-independent word normalization method exemplified on
modernizing historical Slovene words. Our method relies on character-based statistical …

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de
With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …