An open diachronic corpus of historical Spanish: annotation criteria and automatic modernisation...

S Wu, R Cotterell, M Hulden - arXiv preprint arXiv:2005.10213, 2020 - arxiv.org

The transformer has been shown to outperform recurrent neural network-based sequence-to-
sequence models in various word-level NLP tasks. Yet for character-level transduction tasks …

被引用次数：88 相关文章所有 8 个版本

[PDF] arxiv.org

A large-scale comparison of historical text normalization systems

M Bollmann - arXiv preprint arXiv:1904.02036, 2019 - arxiv.org

There is no consensus on the state-of-the-art approach to historical text normalization. Many
techniques have been proposed, including rule-based methods, distance metrics, character …

被引用次数：96 相关文章所有 7 个版本

[PDF] arxiv.org

Improving historical spelling normalization with bi-directional LSTMs and multi-task learning

M Bollmann, A Søgaard - arXiv preprint arXiv:1610.07844, 2016 - arxiv.org

Natural-language processing of historical documents is complicated by the abundance of
variant spellings and lack of annotated data. A common approach is to normalize the …

被引用次数：83 相关文章所有 5 个版本

[PDF] uzh.ch

Encoder-decoder methods for text normalization

M Lusetti, T Ruzsics, A Göhring, T Samardžić, E Stark - 2018 - zora.uzh.ch

Text normalization is the task of mapping non-canonical language, typical of speech
transcription and computer-mediated communication, to a standardized writing. It is an up …

被引用次数：70 相关文章所有 12 个版本

[PDF] arxiv.org

An evaluation of neural machine translation models on historical spelling normalization

G Tang, F Cap, E Pettersson, J Nivre - arXiv preprint arXiv:1806.05210, 2018 - arxiv.org

In this paper, we apply different NMT models to the problem of historical spelling
normalization for five languages: English, German, Hungarian, Icelandic, and Swedish. The …

被引用次数：52 相关文章所有 6 个版本

[PDF] academia.edu

[PDF][PDF] Normalising Slovene data: historical texts vs. user-generated content

N Ljubešic, K Zupan, D Fišer, T Erjavec - Proceedings of the 13th …, 2016 - academia.edu

The paper presents two manually annotated Slovene language text normalisation datasets,
one of historical texts and the other of tweets, and proposes several variants of character …

被引用次数：62 相关文章所有 8 个版本

[PDF] aclanthology.org

[PDF][PDF] Normalizing medieval german texts: from rules to deep learning

N Korchagina - Proceedings of the NoDaLiDa 2017 Workshop on …, 2017 - aclanthology.org

The application of NLP tools to historical texts is complicated by a high level of spelling
variation. Different methods of historical text normalization have been proposed. In this …

被引用次数：44 相关文章所有 8 个版本

[PDF] aclanthology.org

Learning attention for historical text normalization by learning to pronounce

M Bollmann, J Bingel, A Søgaard - … of the 55th Annual Meeting of …, 2017 - aclanthology.org

Automated processing of historical texts often relies on pre-normalization to modern word
forms. Training encoder-decoder architectures to solve such problems typically requires a lot …

被引用次数：42 相关文章所有 3 个版本

[PDF] hal.science

Modernizing historical Slovene words with character-based SMT

Y Scherrer, T Erjavec - BSNLP 2013-4th Biennial Workshop on …, 2013 - inria.hal.science

We propose a language-independent word normalization method exemplified on
modernizing historical Slovene words. Our method relies on character-based statistical …

被引用次数：54 相关文章所有 14 个版本

[PDF] ruhr-uni-bochum.de

[PDF][PDF] Normalization of historical texts with neural network models

M Bollmann - 2018 - hss-opus.ub.ruhr-uni-bochum.de

With the increasing availability of digitized resources of historical documents, interest in
effective natural language processing (NLP) for these documents is on the rise. However …

被引用次数：31 相关文章所有 7 个版本

高级搜索

QQ 群