Analyzing Challenges in Neural Machine Translation for Software Localization

S Koneru, M Huck, M Exel… - Proceedings of the 17th …, 2023 - aclanthology.org
Proceedings of the 17th Conference of the European Chapter of the …, 2023aclanthology.org
Abstract Advancements in Neural Machine Translation (NMT) greatly benefit the software
localization industry by decreasing the post-editing time of human annotators. Although the
volume of the software being localized is growing significantly, techniques for improving
NMT for user interface (UI) texts are lacking. These UI texts have different properties than
other collections of texts, presenting unique challenges for NMT. For example, they are often
very short, causing them to be ambiguous and needing additional context (button, title text, a …
Abstract
Advancements in Neural Machine Translation (NMT) greatly benefit the software localization industry by decreasing the post-editing time of human annotators. Although the volume of the software being localized is growing significantly, techniques for improving NMT for user interface (UI) texts are lacking. These UI texts have different properties than other collections of texts, presenting unique challenges for NMT. For example, they are often very short, causing them to be ambiguous and needing additional context (button, title text, a table item, etc.) for disambiguation. However, no such UI data sets are readily available with contextual information for NMT models to exploit. This work aims to provide a first step in improving UI translations and highlight its challenges. To achieve this, we provide a novel multilingual UI corpus collection (∼ 1.3 M for English↔ German) with a targeted test set and analyze the limitations of state-of-the-art methods on this challenging task. Specifically, we present a targeted test set for disambiguation from English to German to evaluate reliably and emphasize UI translation challenges. Furthermore, we evaluate several state-of-the-art NMT techniques from domain adaptation and document-level NMT on this challenging task. All the scripts to replicate the experiments and data sets are available here. ˆ,
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果