Neural machine translation for low-resource languages: A survey

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

Madlad-400: A multilingual and document-level large audited dataset

S Kudugunta, I Caswell, B Zhang… - Advances in …, 2024 - proceedings.neurips.cc
We introduce MADLAD-400, a manually audited, general domain 3T token monolingual
dataset based on CommonCrawl, spanning 419 languages. We discuss the limitations …

SeamlessM4T-Massively Multilingual & Multimodal Machine Translation

L Barrault, YA Chung, MC Meglioli, D Dale… - arXiv preprint arXiv …, 2023 - arxiv.org
What does it take to create the Babel Fish, a tool that can help individuals translate speech
between any two languages? While recent breakthroughs in text-based models have …

Bitext mining using distilled sentence representations for low-resource languages

K Heffernan, O Çelebi, H Schwenk - arXiv preprint arXiv:2205.12654, 2022 - arxiv.org
Scaling multilingual representation learning beyond the hundred most frequent languages is
challenging, in particular to cover the long tail of low-resource languages. A promising …

ChatGPT MT: Competitive for high-(but not low-) resource languages

NR Robinson, P Ogayo, DR Mortensen… - arXiv preprint arXiv …, 2023 - arxiv.org
Large language models (LLMs) implicitly learn to perform a range of language tasks,
including machine translation (MT). Previous studies explore aspects of LLMs' MT …

Frmt: A benchmark for few-shot region-aware machine translation

P Riley, T Dozat, JA Botha, X Garcia… - Transactions of the …, 2023 - direct.mit.edu
We present FRMT, a new dataset and evaluation benchmark for Few-shot Region-aware
Machine Translation, a type of style-targeted translation. The dataset consists of professional …

Bilex rx: Lexical data augmentation for massively multilingual machine translation

A Jones, I Caswell, I Saxena, O Firat - arXiv preprint arXiv:2303.15265, 2023 - arxiv.org
Neural machine translation (NMT) has progressed rapidly over the past several years, and
modern models are able to achieve relatively high quality using only monolingual text data …

Gatitos: Using a new multilingual lexicon for low-resource machine translation

A Jones, I Caswell, O Firat, I Saxena - Proceedings of the 2023 …, 2023 - aclanthology.org
Modern machine translation models and language models are able to translate without
having been trained on parallel data, greatly expanding the set of languages that they can …

Learn and Consolidate: Continual Adaptation for Zero-Shot and Multilingual Neural Machine Translation

K Huang, P Li, J Liu, M Sun, Y Liu - Proceedings of the 2023 …, 2023 - aclanthology.org
Although existing multilingual neural machine translation (MNMT) models have
demonstrated remarkable performance to handle multiple translation directions in a single …

Llm augmented llms: Expanding capabilities through composition

R Bansal, B Samanta, S Dalmia, N Gupta… - arXiv preprint arXiv …, 2024 - arxiv.org
Foundational models with billions of parameters which have been trained on large corpora
of data have demonstrated non-trivial skills in a variety of domains. However, due to their …