[PDF][PDF] Open-source morphology for endangered mordvinic languages

J Rueter, M Hämäläinen… - Workshop for NLP Open …, 2020 - researchportal.helsinki.fi
This document describes shared development of finite-state description of two closely
related but endangered minority languages, Erzya and Moksha. It touches upon …

Fst morphology for the endangered skolt sami language

J Rueter, M Hämäläinen - arXiv preprint arXiv:2004.04803, 2020 - arxiv.org
We present advances in the development of a FST-based morphological analyzer and
generator for Skolt Sami. Like other minority Uralic languages, Skolt Sami exhibits a rich …

Neural morphology dataset and models for multiple languages, from the large to the endangered

M Hämäläinen, N Partanen, J Rueter… - arXiv preprint arXiv …, 2021 - arxiv.org
We train neural models for morphological analysis, generation and lemmatization for
morphologically rich languages. We present a method for automatically extracting …

Apurin\~ a Universal Dependencies Treebank

J Rueter, MFP de Freitas, SDS Facundes… - arXiv preprint arXiv …, 2021 - arxiv.org
This paper presents and discusses the first Universal Dependencies treebank for the
Apurin\~ a language. The treebank contains 76 fully annotated sentences, applies 14 parts …

Sentiment analysis using aligned word embeddings for uralic languages

K Alnajjar, M Hämäläinen, J Rueter - arXiv preprint arXiv:2305.15380, 2023 - arxiv.org
In this paper, we present an approach for translating word embeddings from a majority
language into 4 minority languages: Erzya, Moksha, Udmurt and Komi-Zyrian. Furthermore …

Using graph-based methods to augment online dictionaries of endangered languages

K Alnajjar, M Hämäläinen… - Workshop on the …, 2022 - researchportal.helsinki.fi
Many endangered Uralic languages have multilingual machine readable dictionaries saved
in an XML format. However, the dictionaries cover translations very inconsistently between …

Byte-based multilingual NMT for endangered languages

M Zhang, J Xu - Proceedings of the 29th International Conference …, 2022 - aclanthology.org
Multilingual neural machine translation (MNMT) jointly trains a shared model for translation
with multiple language pairs. However, traditional subword-based MNMT approaches suffer …

Morphological Disambiguation of South S\'ami with FSTs and Neural Networks

M Hämäläinen, L Wiechetek - arXiv preprint arXiv:2004.14062, 2020 - arxiv.org
We present a method for conducting morphological disambiguation for South S\'ami, which
is an endangered language. Our method uses an FST-based morphological analyzer to …

Prerequisites for shallow-transfer machine translation of Mordvin languages: Language documentation with a purpose

J Rueter, M Hämäläinen - 2021 - preprints.org
This paper presents the current lexical, morphological, syntactic and rule-based machine
translation work for Erzya and Moksha that can and should be used in the development of a …

Working Towards Digital Documentation of Uralic Languages With Open-Source Tools and Modern NLP Methods

M Hämäläinen, J Rueter, K Alnajjar… - Proceedings of the Big …, 2023 - aclanthology.org
We present our work towards building an infrastructure for documenting endangered
languages with the focus on Uralic languages in particular. Our infrastructure consists of …