The Flores-101 Evaluation Benchmark for Low-Resource and Multilingual Machine Translation

N Goyal, C Gao, V Chaudhary, PJ Chen… - Transactions of the …, 2022 - direct.mit.edu
One of the biggest challenges hindering progress in low-resource and multilingual machine
translation is the lack of good evaluation benchmarks. Current evaluation benchmarks either …

Scaling neural machine translation to 200 languages

NLLB Team - Nature, 2024 - pmc.ncbi.nlm.nih.gov
The development of neural techniques has opened up new avenues for research in
machine translation. Today, neural machine translation (NMT) systems can leverage highly …

Findings of the WMT 2021 shared task on large-scale multilingual machine translation

G Wenzek, V Chaudhary, A Fan, S Gomez… - Proceedings of the …, 2021 - aclanthology.org
We present the results of the first task on Large-Scale Multilingual Machine Translation. The
task consists on the many-to-many evaluation of a single model across a variety of source …

Gatitos: Using a new multilingual lexicon for low-resource machine translation

A Jones, I Caswell, O Firat, I Saxena - Proceedings of the 2023 …, 2023 - aclanthology.org
Modern machine translation models and language models are able to translate without
having been trained on parallel data, greatly expanding the set of languages that they can …

CrossAligner & co: Zero-shot transfer methods for task-oriented cross-lingual natural language understanding

M Gritta, R Hu, I Iacobacci - arXiv preprint arXiv:2203.09982, 2022 - arxiv.org
Task-oriented personal assistants enable people to interact with a host of devices and
services using natural language. One of the challenges of making neural dialogue systems …

Investigating lexical replacements for Arabic-English code-switched data augmentation

I Hamed, N Habash, S Abdennadher, NT Vu - arXiv preprint arXiv …, 2022 - arxiv.org
Data sparsity is a main problem hindering the development of code-switching (CS) NLP
systems. In this paper, we investigate data augmentation techniques for synthesizing …

Not lost in translation: The implications of machine translation technologies for language professionals and for broader society

F Borgonovi, J Hervé, H Seitz - 2023 - oecd-ilibrary.org
The paper discusses the implications of recent advances in artificial intelligence for
knowledge workers, focusing on possible complementarities and substitution between …

Integrating unsupervised data generation into self-supervised neural machine translation for low-resource languages

D Ruiter, D Klakow, J van Genabith… - arXiv preprint arXiv …, 2021 - arxiv.org
For most language combinations, parallel data is either scarce or simply unavailable. To
address this, unsupervised machine translation (UMT) exploits large amounts of …

AugCSE: Contrastive sentence embedding with diverse augmentations

Z Tang, MY Kocyigit, D Wijaya - arXiv preprint arXiv:2210.13749, 2022 - arxiv.org
Data augmentation techniques have been proven useful in many applications in NLP fields.
Most augmentations are task-specific, and cannot be used as a general-purpose tool. In our …

Semantic connections in the complex sentences for post-editing machine translation in the Kazakh language

A Turganbayeva, D Rakhimova, V Karyukin… - Information, 2022 - mdpi.com
The problems of machine translation are constantly arising. While the most advanced
translation platforms, such as Google and Yandex, allow for high-quality translations of …