查看文章

clinjournal.org 中的 [PDF]

Automating lexical simplification in Dutch

作者

Bram Bulté, Leen Sevens, Vincent Vandeghinste

发表日期

2018/12/1

期刊

Computational Linguistics in the Netherlands Journal

卷号

页码范围

24-48

简介

We discuss the design, development and evaluation of an automated lexical simplification tool for Dutch. A basic pipeline approach is used to perform both text adaptation and annotation. First, sentences are preprocessed and word sense disambiguation is performed. Then, the difficulty of each token is estimated by looking at their average age of acquisition and frequency in a corpus of simplified Dutch. We use Cornetto to find synonyms of words that have been identified as difficult and the SONAR500 corpus to perform reverse lemmatisation. Finally, we rely on a largescale language model to verify whether the selected replacement word fits the local context. In addition, the text is augmented with information from Wikipedia (word definitions and links). We tune and evaluate the system with sentences taken from the Flemish newspaper De Standaard. The results show that the system’s adaptation component has low coverage, since it only correctly simplifies around one in five ‘difficult’words, but reasonable accuracy, with no grammatical errors being introduced in the text. The Wikipedia annotations have a broader coverage, but their potential for simplification needs to be further developed and more thoroughly evaluated.

引用总数

被引用次数：15

2019202020212022202320243 3 4 1 4

学术搜索中的文章

Automating lexical simplification in Dutch

B Bulté, L Sevens, V Vandeghinste - Computational Linguistics in the Netherlands Journal, 2018

被引用次数：15 相关文章所有 7 个版本