Neural machine translation: A review

F Stahlberg - Journal of Artificial Intelligence Research, 2020 - jair.org
The field of machine translation (MT), the automatic translation of written text from one
natural language into another, has experienced a major paradigm shift in recent years …

Findings of the 2019 conference on machine translation (WMT19)

L Barrault, O Bojar, MR Costa-Jussa, C Federmann… - 2019 - zora.uzh.ch
This paper presents the results of the premier shared task organized alongside the
Conference on Machine Translation (WMT) 2019. Participants were asked to build machine …

Improving back-translation with uncertainty-based confidence estimation

S Wang, Y Liu, C Wang, H Luan, M Sun - arXiv preprint arXiv:1909.00157, 2019 - arxiv.org
While back-translation is simple and effective in exploiting abundant monolingual corpora to
improve low-resource neural machine translation (NMT), the synthetic bilingual corpora …

Neural machine translation with a polysynthetic low resource language

JE Ortega, R Castro Mamani, K Cho - Machine Translation, 2020 - Springer
Low-resource languages (LRL) with complex morphology are known to be more difficult to
translate in an automatic way. Some LRLs are particularly more difficult to translate than …

MorphyNet: a large multilingual database of derivational and inflectional morphology

K Batsuren, G Bella, F Giunchiglia - Proceedings of the 18th …, 2021 - aclanthology.org
Large-scale morphological databases provide essential input to a wide range of NLP
applications. Inflectional data is of particular importance for morphologically rich …

How much does tokenization affect neural machine translation?

M Domingo, M Garcıa-Martınez, A Helle… - arXiv preprint arXiv …, 2018 - arxiv.org
Tokenization or segmentation is a wide concept that covers simple processes such as
separating punctuation from words, or more sophisticated processes such as applying …

Findings of the WMT 2020 shared task on machine translation robustness

L Specia, Z Li, J Pino, V Chaudhary… - Proceedings of the …, 2020 - aclanthology.org
We report the findings of the second edition of the shared task on improving robustness in
Machine Translation (MT). The task aims to test current machine translation systems in their …

Investigating the effectiveness of BPE: The power of shorter sequences

M Gallé - Proceedings of the 2019 conference on empirical …, 2019 - aclanthology.org
Abstract Byte-Pair Encoding (BPE) is an unsupervised sub-word tokenization technique,
commonly used in neural machine translation and other NLP tasks. Its effectiveness makes it …

Compositional representation of morphologically-rich input for neural machine translation

D Ataman, M Federico - arXiv preprint arXiv:1805.02036, 2018 - arxiv.org
Neural machine translation (NMT) models are typically trained with fixed-size input and
output vocabularies, which creates an important bottleneck on their accuracy and …

[PDF][PDF] An evaluation of two vocabulary reduction methods for neural machine translation

D Ataman, M Federico - Proceedings of the 13th Conference of the …, 2018 - aclanthology.org
Neural machine translation (NMT) models are conventionally trained with fixed-size
vocabularies to control the computational complexity and the quality of the learned word …