Findings of the 2021 conference on machine translation (WMT21)

F Akhbardeh, A Arkhangorodsky, M Biesialska… - Proceedings of the sixth …, 2021 - cris.fbk.eu
This paper presents the results of the news translation task, the multilingual low-resource
translation for Indo-European languages, the triangular translation task, and the automatic …

Continual knowledge distillation for neural machine translation

Y Zhang, P Li, M Sun, Y Liu - … of the 61st Annual Meeting of the …, 2023 - aclanthology.org
While many parallel corpora are not publicly accessible for data copyright, data privacy and
competitive differentiation reasons, trained translation models are increasingly available on …

An open dataset and model for language identification

L Burchell, A Birch, N Bogoychev, K Heafield - arXiv preprint arXiv …, 2023 - arxiv.org
Language identification (LID) is a fundamental step in many natural language processing
pipelines. However, current LID systems are far from perfect, particularly on lower-resource …

GlotLID: Language identification for low-resource languages

AH Kargaran, A Imani, F Yvon, H Schütze - arXiv preprint arXiv …, 2023 - arxiv.org
Several recent papers have published good solutions for language identification (LID) for
about 300 high-resource and medium-resource languages. However, there is no LID …

Goldfish: Monolingual Language Models for 350 Languages

TA Chang, C Arnett, Z Tu, BK Bergen - arXiv preprint arXiv:2408.10441, 2024 - arxiv.org
For many low-resource languages, the only available language models are large
multilingual models trained on many languages simultaneously. However, using FLORES …

Continual Knowledge Distillation for Neural Machine Translation

Y Zhang, P Li, M Sun, Y Liu - arXiv preprint arXiv:2212.09097, 2022 - arxiv.org
While many parallel corpora are not publicly accessible for data copyright, data privacy and
competitive differentiation reasons, trained translation models are increasingly available on …

Findings of the WMT 2021 shared task on efficient translation

K Heafield, Q Zhu, R Grundkiewicz - Proceedings of the Sixth …, 2021 - aclanthology.org
The machine translation efficiency task challenges participants to make their systems faster
and smaller with minimal impact on translation quality. How much quality to sacrifice for …

Scaling law for document neural machine translation

Z Zhuocheng, S Gu, M Zhang… - Findings of the Association …, 2023 - aclanthology.org
The scaling laws of language models have played a significant role in advancing large
language models. In order to promote the development of document translation, we …

Nvidia nemo offline speech translation systems for IWSLT 2022

O Hrinchuk, V Noroozi, A Khattar… - Proceedings of the …, 2022 - aclanthology.org
This paper provides an overview of NVIDIA NeMo's speech translation systems for the
IWSLT 2022 Offline Speech Translation Task. Our cascade system consists of 1) Conformer …

Netmarble AI Center's WMT21 Automatic Post-Editing Shared Task Submission

S Oh, S Jang, H Xu, S An, I Oh - arXiv preprint arXiv:2109.06515, 2021 - arxiv.org
This paper describes Netmarble's submission to WMT21 Automatic Post-Editing (APE)
Shared Task for the English-German language pair. First, we propose a Curriculum Training …