Neural machine translation for low-resource languages: A survey

S Ranathunga, ESA Lee, M Prifti Skenduli… - ACM Computing …, 2023 - dl.acm.org
Neural Machine Translation (NMT) has seen tremendous growth in the last ten years since
the early 2000s and has already entered a mature phase. While considered the most widely …

Google usm: Scaling automatic speech recognition beyond 100 languages

Y Zhang, W Han, J Qin, Y Wang, A Bapna… - arXiv preprint arXiv …, 2023 - arxiv.org
We introduce the Universal Speech Model (USM), a single large model that performs
automatic speech recognition (ASR) across 100+ languages. This is achieved by pre …

Hallucinations in large multilingual translation models

NM Guerreiro, DM Alves, J Waldendorf… - Transactions of the …, 2023 - direct.mit.edu
Hallucinated translations can severely undermine and raise safety issues when machine
translation systems are deployed in the wild. Previous research on the topic focused on …

Advances of machine learning in materials science: Ideas and techniques

SS Chong, YS Ng, HQ Wang, JC Zheng - Frontiers of Physics, 2024 - Springer
In this big data era, the use of large dataset in conjunction with machine learning (ML) has
been increasingly popular in both industry and academia. In recent times, the field of …

Low-resource languages jailbreak gpt-4

ZX Yong, C Menghini, SH Bach - arXiv preprint arXiv:2310.02446, 2023 - arxiv.org
AI safety training and red-teaming of large language models (LLMs) are measures to
mitigate the generation of unsafe content. Our work exposes the inherent cross-lingual …

Bitext mining using distilled sentence representations for low-resource languages

K Heffernan, O Çelebi, H Schwenk - arXiv preprint arXiv:2205.12654, 2022 - arxiv.org
Scaling multilingual representation learning beyond the hundred most frequent languages is
challenging, in particular to cover the long tail of low-resource languages. A promising …

Multilingual large language model: A survey of resources, taxonomy and frontiers

L Qin, Q Chen, Y Zhou, Z Chen, Y Li, L Liao… - arXiv preprint arXiv …, 2024 - arxiv.org
Multilingual Large Language Models are capable of using powerful Large Language
Models to handle and respond to queries in multiple languages, which achieves remarkable …

Some languages are more equal than others: Probing deeper into the linguistic disparity in the nlp world

S Ranathunga, N De Silva - arXiv preprint arXiv:2210.08523, 2022 - arxiv.org
Linguistic disparity in the NLP world is a problem that has been widely acknowledged
recently. However, different facets of this problem, or the reasons behind this disparity are …

[图书][B] Translation tools and technologies

A Rothwell, J Moorkens, M Fernández-Parra, J Drugan… - 2023 - taylorfrancis.com
To trainee translators and established professionals alike, the range of tools and
technologies now available, and the speed with which they change, can seem bewildering …

Hire a linguist!: Learning endangered languages in LLMs with in-context linguistic descriptions

K Zhang, Y Choi, Z Song, T He… - Findings of the …, 2024 - aclanthology.org
How can large language models (LLMs) process and translate endangered languages?
Many languages lack a large corpus to train a decent LLM; therefore existing LLMs rarely …