Low-resource languages: A review of past work and future challenges

A Magueresse, V Carles, E Heetderks - arXiv preprint arXiv:2006.07264, 2020 - arxiv.org
A current problem in NLP is massaging and processing low-resource languages which lack
useful training attributes such as supervised data, number of native speakers or experts, etc …

On the weaknesses of reinforcement learning for neural machine translation

L Choshen, L Fox, Z Aizenbud, O Abend - arXiv preprint arXiv:1907.01752, 2019 - arxiv.org
Reinforcement learning (RL) is frequently used to increase performance in text generation
tasks, including machine translation (MT), notably through the use of Minimum Risk Training …

Can a transformer pass the wug test? Tuning copying bias in neural morphological inflection models

L Liu, M Hulden - arXiv preprint arXiv:2104.06483, 2021 - arxiv.org
Deep learning sequence models have been successfully applied to the task of
morphological inflection. The results of the SIGMORPHON shared tasks in the past several …

Morphological Processing of Low-Resource Languages: Where We Are and What's Next

A Wiemerslage, M Silfverberg, C Yang… - arXiv preprint arXiv …, 2022 - arxiv.org
Automatic morphological processing can aid downstream natural language processing
applications, especially for low-resource languages, and assist language documentation …

Imitation learning for neural morphological string transduction

P Makarov, S Clematide - arXiv preprint arXiv:1808.10701, 2018 - arxiv.org
We employ imitation learning to train a neural transition-based string transducer for
morphological tasks such as inflection generation and lemmatization. Previous approaches …

Unsupervised morphological paradigm completion

H Jin, L Cai, Y Peng, C Xia, AD McCarthy… - arXiv preprint arXiv …, 2020 - arxiv.org
We propose the task of unsupervised morphological paradigm completion. Given only raw
text and a lemma list, the task consists of generating the morphological paradigms, ie, all …

Tackling the low-resource challenge for canonical segmentation

M Mager, Ö Çetinoğlu, K Kann - arXiv preprint arXiv:2010.02804, 2020 - arxiv.org
Canonical morphological segmentation consists of dividing words into their standardized
morphemes. Here, we are interested in approaches for the task when training data is limited …

Modelling latent translations for cross-lingual transfer

EM Ponti, J Kreutzer, I Vulić, S Reddy - arXiv preprint arXiv:2107.11353, 2021 - arxiv.org
While achieving state-of-the-art results in multiple tasks and languages, translation-based
cross-lingual transfer is often overlooked in favour of massively multilingual pre-trained …

How suitable are subword segmentation strategies for translating non-concatenative morphology?

C Amrhein, R Sennrich - arXiv preprint arXiv:2109.01100, 2021 - arxiv.org
Data-driven subword segmentation has become the default strategy for open-vocabulary
machine translation and other NLP tasks, but may not be sufficiently generic for optimal …

UZH at CoNLL-SIGMORPHON 2018 shared task on universal morphological reinflection

P Makarov, S Clematide - 2018 - zora.uzh.ch
This paper presents the submissions by the University of Zurich to the CoNLL–
SIGMORPHON 2018 Shared Task on Universal Morphological Reinflection. Our system is …