Sparse moe with language guided routing for multilingual machine translation

X Zhao, X Chen, Y Cheng, T Chen - The Twelfth International …, 2023 - openreview.net
Sparse Mixture-of-Experts (SMoE) has gained increasing popularity as a promising
framework for scaling up multilingual machine translation (MMT) models with negligible …

Improving zero-shot multilingual neural machine translation by leveraging cross-lingual consistency regularization

P Gao, L Zhang, Z He, H Wu, H Wang - arXiv preprint arXiv:2305.07310, 2023 - arxiv.org
The multilingual neural machine translation (NMT) model has a promising capability of zero-
shot translation, where it could directly translate between language pairs unseen during …

On the off-target problem of zero-shot multilingual neural machine translation

L Chen, S Ma, D Zhang, F Wei, B Chang - arXiv preprint arXiv:2305.10930, 2023 - arxiv.org
While multilingual neural machine translation has achieved great success, it suffers from the
off-target issue, where the translation is in the wrong language. This problem is more …

UM4: unified multilingual multiple teacher-student model for zero-resource neural machine translation

J Yang, Y Yin, S Ma, D Zhang, S Wu, H Guo… - arXiv preprint arXiv …, 2022 - arxiv.org
Most translation tasks among languages belong to the zero-resource translation problem
where parallel corpora are unavailable. Multilingual neural machine translation (MNMT) …

Towards a better understanding of variations in zero-shot neural machine translation performance

S Tan, C Monz - arXiv preprint arXiv:2310.10385, 2023 - arxiv.org
Multilingual Neural Machine Translation (MNMT) facilitates knowledge sharing but often
suffers from poor zero-shot (ZS) translation qualities. While prior work has explored the …

Informative language representation learning for massively multilingual neural machine translation

R Jin, D Xiong - arXiv preprint arXiv:2209.01530, 2022 - arxiv.org
In a multilingual neural machine translation model that fully shares parameters across all
languages, an artificial language token is usually used to guide translation into the desired …

Learn and Consolidate: Continual Adaptation for Zero-Shot and Multilingual Neural Machine Translation

K Huang, P Li, J Liu, M Sun, Y Liu - Proceedings of the 2023 …, 2023 - aclanthology.org
Although existing multilingual neural machine translation (MNMT) models have
demonstrated remarkable performance to handle multiple translation directions in a single …

Unlikelihood tuning on negative samples amazingly improves zero-shot translation

C Zan, L Ding, L Shen, Y Lei, Y Zhan, W Liu… - arXiv preprint arXiv …, 2023 - arxiv.org
Zero-shot translation (ZST), which is generally based on a multilingual neural machine
translation model, aims to translate between unseen language pairs in training data. The …

Exploring the impact of layer normalization for zero-shot neural machine translation

Z Mao, R Dabre, Q Liu, H Song, C Chu… - arXiv preprint arXiv …, 2023 - arxiv.org
This paper studies the impact of layer normalization (LayerNorm) on zero-shot translation
(ZST). Recent efforts for ZST often utilize the Transformer architecture as the backbone, with …

Adaptive token-level cross-lingual feature mixing for multilingual neural machine translation

J Liu, K Huang, J Li, H Liu, J Su… - Proceedings of the 2022 …, 2022 - aclanthology.org
Multilingual neural machine translation aims to translate multiple language pairs in a single
model and has shown great success thanks to the knowledge transfer across languages …