Auxiliary Subword Segmentations as Related Languages for Low Resource Multilingual Translation

N Kambhatla, L Born, A Sarkar - … of the 23rd Annual Conference of …, 2022 - aclanthology.org
Proceedings of the 23rd Annual Conference of the European Association …, 2022aclanthology.org
We propose a novel technique that combines alternative subword tokenizations of a single
source-target language pair that allows us to leverage multilingual neural translation training
methods. These alternate segmentations function like related languages in multilingual
translation. Overall this improves translation accuracy for low-resource languages and
produces translations that are lexically diverse and morphologically rich. We also introduce
a cross-teaching technique which yields further improvements in translation accuracy and …
Abstract
We propose a novel technique that combines alternative subword tokenizations of a single source-target language pair that allows us to leverage multilingual neural translation training methods. These alternate segmentations function like related languages in multilingual translation. Overall this improves translation accuracy for low-resource languages and produces translations that are lexically diverse and morphologically rich. We also introduce a cross-teaching technique which yields further improvements in translation accuracy and cross-lingual transfer between high-and low-resource language pairs. Compared to other strong multilingual baselines, our approach yields average gains of+ 1.7 BLEU across the four low-resource datasets from the multilingual TED-talks dataset. Our technique does not require additional training data and is a drop-in improvement for any existing neural translation system.
aclanthology.org
以上显示的是最相近的搜索结果。 查看全部搜索结果